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In 194G, R. Tv. Anderson was asked to set up a course in mathematical 
statistics with an applied viewpoint for the Department of Experimental 
Statistics at North Carolina State College. Since no textbook of this 
type was then available, a set of notes was prepared in consultation with 
W. G. Cochran. These notes were mimeographed and later became a 
part of the Institute of Statistics Mimeo Series. This material borrowed 
heavily from the notes taken by the author from Professor Cochran at 
Iowa State College. About the same time, T. A. Bancroft was organiz¬ 
ing a set of notes similar to Part I, plus the material on regression in this 
text, at Iowa State College and in 1948 proposed that the two sets be 
amalgamated into a textbook. 

After numerous rewritings, we have settled on the present version of 
a combined textbook in mathematical statistics and a reference book for 
the research worker. This book has been divided into two parts. Part I 
presents basic statistical theory with some emphasis on research prob¬ 
lems ; Part II presents the theory of least squares and its use in the analy¬ 
sis of experimental data. Bancroft had the primary responsibility of 
writing Part I and Anderson of Part II, although there have been frequent 
consultations between the authors to maintain reasonable continuity 
throughout the book. At North Carolina State College, this material 
lias been used in three courses: (1) a one-year course in applied mathe¬ 
matical statistics, (2) some of the elementary parts of Chaps. 17 to 24 as 
supplementary material for a one-quarter course in design of experi¬ 
ments, and (3) the more advanced material in Part II for a one-quarter 
course in advanced experimental statistics. At Iowa State College, 
Part I plus the material on regression has been taught in a two-quarter 
theory course, and selected sections of Part II in a one-quarter advanced 
methods course. 

Many research workers have expressed a need for a convenient refer¬ 
ence book on statistical theory pointed to research problems, which 
could be used in conjunction with their books on general statistical 
methods, experimental design, and survey sampling. The authors have 
tried to write a book which would serve this purpose as well as that of a 
textbook in statistical theory. They realize the difficulties in such an 
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undertaking and will welcome suggestions on methods of improving this 
book both from a reference and from a text standpoint, without adding 
materially to the complexity of the material or the length of the book. 

If this book is used as a class text, the teacher should make a choice of 
the topics to be covered. It was not contemplated that all of the material 
could be covered in a one-year course. Points which must be con¬ 
sidered in deciding on topics to be taken up are number of lecture and 
laboratory hours available, interests of the students, previous training 
of the students (in mathematics, statistics, and experimentation), and 
purpose of the course (terminal or part of a sequence). A suggested list 
of chapters and sections for a one-year course in statistics is presented 
in the next section. We have postulated that only a good background 
in differential and integral calculus is required and that the student might 
use this book as an introduction to statistics. However, a previous or 
concurrent course in statistical methods is quite helpful. Several special¬ 
ized topics in matrix algebra and advanced calculus, necessary to an 
understanding of certain parts of the theory, have been included. Of 
course, previous work in these or other courses in mathematics would 
be helpful. 

Perhaps the best method of studying the theoretical aspects of statis¬ 
tical techniques is to look at the examples first and then go back to the 
theory. However, the authors feel that it is best to develop a general 
theory first, since many research workers will have their own examples in 
mind when they read this material, and the authors want the theory to 
be general enough to apply to these examples, not pointed at examples in 
the text. In lecturing on the material in this book, it may be advisable 
to present the theory and an example together. 

It should be noted that the material in Part II, plus the necessary 
background material in Part I, could very well form the basis for an 
introductory course in the theory and methods of experimental design. 
This course would include the omitted sections in the suggested outline 
for a first course in statistics, especially Chaps. 19, 23, and 24. 

As indicated earlier in this preface, the authors are indebted to Prof. 
W. G. Cochran for a majority of their early ideas in mathematical and 
applied statistics. It is understood that they are responsible for the 
interpretation and amplification of these ideas as presented in this book 
and for any mistakes in theory or application therein. They wish to 
express their appreciation to Profs. G. E. Nicholson and H. F. Smith, 
who made valuable suggestions for improving the presentation after 
teaching from the original mimeographed materials, and to Om Aggarwal 
for his help in working through the examples and exercises in Part I. 
In addition they received many helpful comments from other staff mem- 
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bers and graduate students and from the reviewers of the original manu¬ 
script. Special thanks are due their typists, Mrs. Jeanne Rathz and 
Mrs. Margaret Kirwin, who did yeoman service in interpreting the 
authors’ hieroglyphic*s. 

The authors are indebted to Prof. George W. Snedecor and the Iowa 
State College Press for permission to use the many examples and tables 
taken from Statistical Methods. They are also indebted to Miss Catherine 
Thompson and Prof. Egon S. Pearson, editor of Biometrika, for permis¬ 
sion to include Table II in the Appendix, which is an abridged version 
of a table presented in Biometrika; and to Prof. Ronald A. Fisher, Cam¬ 
bridge, Dr. Frank Yates, Rothamsted, and Messrs. Oliver & Boyd, Ltd., 
Edinburgh and London, for permission to reprint Table III in the Appen¬ 
dix, which is an abridged version of Table III in their book Statistical 
Tables for Biological, Agricultural, and Medical Research. Finally they 
express their sincere appreciation for permission to use the various sets of 
experimental data included in their examples and exercises. 

R. L. Anderson 
T. A. Bancroft 

Raleigh, N. C. 

Ames, Iowa 
July, 1952 
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SUGGESTED TOPICS FOR A ONE-YEAR COURSE 


Part I 

1. Use all of the first four chapters, except perhaps to omit cumulants 
(Sec. 4.6). 

2. Perhaps omit the /c-variate case in Chap. 5 (Sec. 5.9). 

3. Emphasize linear forms in Chap. 6 (Sec. 6.4 to 6.7). 

4. Perhaps omit the Fisher-Behrens problem in Sec. 7.11. 

5. Chapters 8 to 11 are concerned with the fundamental theory of estima¬ 
tion and tests of significance. The teacher may wish to discuss only 
the applications of these ideas without taking up all of the theoretical 
details. For example, it may be necessary to discuss the concept of 
power (Secs. 11.4 to 11.6) in a condensed manner and spend most of 
the time in Chap. 11 on the methods of setting up the likelihood-ratio 
criterion (Sec. 11.7). 

6. It may be necessary to delete the theoretical parts of Chap. 12 and all 
of Secs. 12.6 and 12.8. 

Part II 

1. Study all of Chap. 13, but only Sec. 14.1 of Chap. 14, unless matrix- 
algebra methods are desired. 

2. It may be desirable to teach only one method of computation in Chap. 
15 and omit either Sec. 15.2 or Sec. 15.3. 

3. All of Chap. 16 probably should be omitted except for a short assign¬ 
ment on curve fitting alone, with no theory. 

4. A selection of topics from Chaps. 17 to 25 should be made. The 
following is suggested: 

(а) Chapter 17 is a discussion chapter and should be read, but the 
authors doubt that the student will understand much of it unless 
he has handled some experiments previously. 

(б) One or two examples from each type of design in Chap. 18 should 
suffice. 

(c) Omit Chap. 19 (incomplete-blocks designs). 

{d) Use only two or three examples in Chap. 20. Omit Secs. 20.5 and 

20 . 6 . 


XV 
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(e) Possibly omit the theory for covariance (Chap. 21). Omit Secs. 
21.3 and 21.4. 

(/) Since the theory of variance components is quite complicated, it 
may be necessary to consider only a few simple problems. Omit 
Sec. 22.4. 

'y) It is doubtful that a beginner in statistical theory will be able to 
understand Chap. 23 (the mixed model), but it might be instructive 
to go through a few examples to point out the difficulties involved. 
This chapter was written primarily for research workers, who gen¬ 
erally want to use a mixed model. The split-plot design should be 
pointed out (Sec. 23.2.) 

(h) Omit Chap. 24. 

(i) In Chap. 25, omit Secs. 25.1 and 25.3, but go over Secs. 25.2 and 
25.4. 
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CHAPTER 1 

INTRODUCTION 


1.1. What Is Statistics? Dissatisfied with attempts at giving a 
precise formal definition of their subject, mathematicians have on occa¬ 
sion defined mathematics as being those professional activities engaged in 
by mathematicians. Later in this chapter an attempt will be made to 
give a more formal definition of the subject of statistics. For the present, 
however, if in the above definition the words “mathematics’^ and “mathe¬ 
matician” are replaced by “statistics” and “statistician,” a definition 
of statistics might be written as follows: 

Statistics comprises those professional activities engaged in by statisticians. 

In order to gain some insight into the nature of statistics, then, it 
remains to present in some detail the steps taken by a statistician in 
some simple yet representative investigation. 

1.2. A Representative Investigation. The problem is: Is test material 
B, which is the same as standard material A with one added chemical, a 
more deadly fly spray than A ? 

Step a. A hypothesis, sometimes referred to as a null hypothesis, is 
set up. In this case a pertinent hypothesis, for which simple techniques 
for testing are available, is: Test material D, with the added chemical, is 
equally effective as standard material A in killing flies. 

Step h. An experiment is designed to test the hypothesis. Four 
batches of 100 randomly selected flies are each sprayed with each spray. 

Step c. Pertinent data are collected and tabulated. The number of 
flies killed in each batch are given in the following table: 


A 

Rank 

B 

Rank 

60 

7 

68 

3 

61 

6 

69 

2 

67 

4 

64 

5 

56 

8 

71 

1 

244 

25 

272 

11 


The numbers killed in the eight batches have been ranked from 1 to 8 
and the ranks summed for A and B separately in order to make use of the 
simple techniques for testing mentioned in step a. A low sum indicates 
a more effective kill. 
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Now, if B is actually more effective than A, we should expect the experi¬ 
ment to give evidence of this, that is, the sum of the ranks, S, for B would 
be expected to be less than that for A. The total of the eight ranks is 
1 + 2 + • * • + 8 = 36; hence, the sum for A is 36 — S. The pos¬ 
sible values of S range from l + 2-j-3 + 4 = 10to5-}-6-|-7-f-8 = 26. 
In our experiment /8 = 11. A critical question is now in order: Does this 
low value of S, for this one experiment, indicate that B is more effective 
than A, or is there a high probability that one could obtain a value of 
S = 11 or lower by chance when B was actually no more effective than A ? 

Step cL The distribution of the data, on the assumption that the null 
hypothesis is true, is obtained. In order to answer the question posed in 
step c, we assume that the null hypothesis is true, that is, that A and B 
are equally effective fly killers, and find the probability, a, that such a low 
value (or a lower one) as /S = 11 could have been obtained under this 
assumption. If this probability is low, we reject the null hypothesis, 
knowing that there is a remote possibility (measured by a) that the null 
hypothesis is correct. 

To effect the above, we find the number of ways of obtaining the various 
possible values of S. We need to consider only one of the groups, in this 
instance the B group, since the rank numbers in the A group will be the 
four not used in B. Now, >8 = 10 may be obtained in only 1 way as the 
sum of 1, 2, 3, 4, while S — 12 may be obtained in 2 ways as the sum of 
either 1, 2, 3, 6 or 1, 2, 4, 5, etc. A table of such values is set out below: 


Sum of ranks, >S 

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

Ways to form sums, N 

11235577877 5 53211 


A short-cut method of obtaining the various values of N is given by 
Wilcoxon.^ In the next chapter, a short-cut method will also be given 
for determining the number of ways of dividing 8 quantities into 2 groups 
of 4 each. For the present we simply note that XN = 70, where the 
symbol S is used to mean ^^the sum of.^’ 

Now, there are only 2 possibilities out of a total of 70 in which B could 
appear as effective as (or better than) it did in the performed experiment. 
Or, stated another way, if A and B were equally effective, we could expect 
as good or better showing by B in only 2 out of 70 experiments on the 
average. 

Step e. A test of significance is performed. Usually we decide on an 
acceptable probability level, a, before the experiment is conducted, and 
if the results give a probability < a, we state that the null hypothesis is 
rejected at the a significance level. Two commonly used values of a are 
0.05 and 0.01. If the null hypothesis is rejected at the a = 0.05 level, the 


1 




m 
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results are said to be significant; if at the a — 0.01 level, highly 
significant. 

In our experiment a = .03. Hence we might well reject the null hypoth¬ 
esis at the a = .03 probability level. In this case, if the null hypoth¬ 
esis was not rejected, we should be forced to accept the happening of an 
improbable event. 

Our experiment is not large enough ever to attain high significance, 
since even if every B kill was larger than the highest A. kill, a could not be 
greater than ^ = .014. In order to have an experiment sensitive enough 
to detect differences at the a = .01 level, more batches of flies must be 
used. In Chap. 7 we shall present a method of testing the null hypothesis 
which utilizes the actual number of flies killed. This so-called t test 
is more sensitive to the detection of real treatment differences than oui 
ranking test. However, this new test requires the development of a more 
complex theoretical background. 

In tests like the t test the observations or sample values are specified as 
having been drawn from parent populations of known mathematical form 
containing one or more unknown parameters. In such cases the null 
hypothesis is a statement concerning the parameter(s). R. A. Fisher- 
has defined the basic theoretical problems underlying modern applied 
statistics as those of specification, distribution, estimation, and tests of 
hypotheses. The sample observations are specified as having been drawn 
from some parent population with unknown parameter(s). Functions of 
the sample observations are calculated as estimators of these parameters. 
The mathematical forms of the distributions of these functions obtained 
from repeated samplings are derived. Properties of estimators are 
investigated in order that an appropriate estimator may be used in an 
applied situation. Appropriate test criteria are constructed in order that 
valid tests of hypotheses may be made. 

The representative investigation described above is concerned with an 
experiment. A second large class of investigations involving the use of 
statistical methodologies is that of the sample survey. Either may lead to 
estimates of population parameters, the setting of confidence limits, or 
tests of significance. 

Close inspection of the steps taken by the statistician in the above 
representative investigation will reveal the fact that most are similar or 
identical with those taken by workers engaged in many different fields of 
scientific enquiry. What essential characteristic then is peculiar to the 
professional activities of the statistician? A careful comparison wifi 
reveal that this essential characteristic is the use of the mathematics of 
probability to calculate from the observations themselves a measure of 
the fallibility of conclusions and estimates. However, valid and efficient 
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measures of the fallibility of conclusions in terms of exact probability 
statements are possible only if the earlier steps in the investigation are 
taken with this end product in mind. Hence, the statistician finds him¬ 
self vitally concerned with matters not strictly concerned with this main 
aspect, such as statement of the null hypothesis, design of experiments 
and sample surveys, questionnaire construction and training and super- 
vdsion of enumerators, experimental techniques, collection and tabulation 
of data, specification of the parent population distribution, and inter¬ 
pretation of results. 

1.3. A Formal Definition of Statistics. With the above dis(!ussion 
in mind, statistics might be given the following formal definition: 

Statistics is the science and art of the developm.ent and application of the 
most effective methods of collecting, tabulating, and interpreting quantitative 
data in such a manner that the fallibility of conclusions and estimates may 
be assessed by means of inductive reasoning based on the mathematics of 
probability. 

1.4. Probability. It was stated in the definition that statistics uses 
inductive reasoning based on the mathematics of probability. In this 
respect, then, statistics is a branch of applied mathematics whose method¬ 
ologies stem from the axioms and theorems of probability, which in turn 
is a branch of pure mathematics. A definition of the probability of the 
occurrence of an event would appear to be in order. Unfortunately there 
is no general agreement among workers in the field as to what constitutes 
a satisfactory definition. For reasons of simplicity the classical definition 
will be given, which is as follows: 

If an event can occur in N equally likely and mutually exclusive ways, and 
if n of these ways have an attribute A, then the probability of the occurrence of 
A is n/N. 

This is the definition of a priori probability, that is, it assumes that it is 
possible logically to determine, before trials are made, all the equally 
likely and mutually exclusive ways that an event may happen and to 
assign n of these ways to the occurrence of attribute A. 

Example 1.1. What is the probability of obtaining a head with a 
penny on a single toss? Assuming the coin a ^‘true’^ coin, we reason that 
it may fall 2 equally likely ways and that 1 of them must be heads; hence, 
the probability is 

Notice that the classical definition of probability, in using the words 
^‘equally likely,” assumes a knowledge of probability in order to define 
the term. Logically, of course, this is certainly undesirable, but a more 
satisfactory definition must await a higher level of mathematical maturity 
than that assumed for this text. 

In actual practice it would appear that a priori probability might have 
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limited usefulness. It may not be possible in a great many important 
scientific problems to determine logically, before trials are made, all the 
equally likely and mutually exclusive ways that an event may happen 
and to assign n of these ways to the occurrence of attribute A. For 
instance, even in the example, the penny may have a tendency to turn up 
heads more often than tails, and the probability of heads, then, is no 
longer But what is the probability of heads now ? Suppose the penny 
is tossed 100 times and 55 heads are noted. The probability of a head 
might be tentatively set as .55. However, we have only an estimate 
based on 100 tosses of some unknown postulated ‘True” probability in 
a theoretical infinite population of throws. This estimate is called the 
empirical probability. 

What then is the connection between a priori probability and the sam¬ 
ple estimate of some unknown hypothetical population probability, that 
is, the empirical probability? With the a priori definition and certain 
postulates the mathematics of probability provides fundamental laws or 
theorems of probability which in turn make possible the solution of many 
classes of problems. If the unknown hypothetical population probability 
and its sample estimates, the empirical probability, be assumed to be 
amenable to the same fundamental laws, then a means becomes available 
for solving many important problems in the empirical sciences. 

EXERCISES 

1.1. Using the last three observations for A and B in the data given in 
step c, test the same null hypothesis of step a. 

1.2. Add the two observations A (66), B (70) to the data of step c, and 
test the null hypothesis of step a. 

1.3. Follow the instructions of Exercise 1.2, adding only A (66). 

1.4. What is the a priori probability of obtaining a 7 with a pair of ordi¬ 
nary dice, if the dice are assumed to be ^Hrue”? Assuming that the pair 
of dice are not ‘True,” how could one obtain a reasonable estimate of the 
unknown probability of obtaining a 7, that is, the empirical probability? 

1.6. Bead references cited 1 and 2. 
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CHAPTER 2 


PROBABILITY 

2.1. Introduction. It was stated earlier that statistics concerns itself 
Avith inductive reasoning based on the mathematics of probability. In 
developing statistical methodology the statistician makes use of the 
definitions, postulates, and theorems of mathematical probability. What 
can be said of this framework, these bones,” of statistics? The theory 
of probability had its genesis in the application of mathematics to deter¬ 
mining the odds in various games of chance: dice, cards, spun wheels, etc. 
In particular, the foundations of the science of probability rvere laid by tAvo 
seventeenth-century mathematicians, Pascal and Fermat, in their private 
correspondence concerning questions raised regarding the gambling 
observations of the French nobleman, Chevalier de Mer4. Books on 
games of chance are still being written by Avorkers in probability and 
statistics. ^ 

Statistics is then no dry-as-dust subject concerning itself AAuth the 
compilation of innumerable tables and charts. On the contrary, it deals 
with the development and application of an important methodology based 
on the fascinating subject of probability. This methodology has become 
of great importance as a research tool in the physical, biological, and 
social sciences. 

2.2. Number of Ways an Event Can Occur: Permutations and Com¬ 
binations. The number of Avays in Avhich an event may occur may be 
determined by enumeration or by the use of some simple rules from college 
algebra. The latter method is simpler for more complicated problems. 
Tavo fundamental theorems are; 

Rule 2.1, If A can happen in m ways and B in n ways independent {the 
occurrence of one does not affect the chance of the occurrence of the other) of m, 
then both A and B can happen in mn loays. 

Example 2.1. If tAvo ordinary dice numbered I and 2 are tossed, one 
may appear face up in G AA^ays Avhich are independent of the 6 Avays in 
AA^hidr the second may appear. Hence, both may appear face up together 
in 36 different A\^ays. 

Ride 2.2. If A. can happen in ?n ways and B in n ways mutually exclusive 
{the occurrence of one precludes the occurrence of the other) of w, then either 
A or B call happen in m n ways. 
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Example 2.2. Either an ace or a king (one card) may be drawn from 
an ordinary deck of cards in 4 + 4 = 8 ways. 

For multiple arrangements, a rule for permutations can be applied. If 
it be desired to arrange n different objects into sets of r objects per set, the 
number of such arrangements is called ‘Hhe number of permutations of n 
objects taken r at a time^^ and is indicated by P{n,r). The first of the r 
positions can be filled in n ways, the second in (n — 1) ways since one 
object will have been used in the first position, the third in {n — 2) ways, 
etc. Hence, by Rule 2.1; 

Rule 2.3. P(n,r) = n{n — l)(n — 2) • • • (n — r + 1) = 

[71 — r )! 

where n\ = niri — 1) • • • 2.1. 

It should be noted that, when r = n, P{n,n) = n!, which implies that 

0 ! = 1 . 

Example 2.3. The number of different ways of selecting a president, 
vice-president, and secretary from a suggested slate of 6 is 

P(fi,3) = 6 • 5 • 4 = 120. 

Suppose that all n objects in the arrangement are used, but certain 
groups ni, 712, etc., are alike. Any rearrangement of the objects of any rii 
group will not change any particular arrangement, hence, the total num¬ 
ber of arrangements will be less than if all the objects were different from 
one another. Now, any group of 7^^ alike objects can be arranged Uil 
ways, and since these nj arrangements are alike for every arrangement 
of the other objects, the total number of different arrangements will be 
given as below: 

n ! 

Rule 2.4. P(7^;7^l,?^2,7^3, . . .) = ^ \ ’ where P(n;ni,n 2 ,n 3 , 

. . .) represents the total number of permutations^ given that n\ are alike, 
712 alike hut different from the first group, etc,, and Y,ni = n. 

Example 2.4. How many different 6-flag signals may be made if 3 

are red, 2 blue, and 1 yellow? P(6;3,2,l) = 

If interest lies only in groups of objects and not in the arrangements 
within the groups, then combinatorial rules apply. The total number of 
combinations of n objects taken r at a time is denoted symbolically as 
C{n,r). It is easily seen that 

P{n,r) = C(n,r)P(r,r), 

since each combination of r objects may be permuted P{r,r) times. The 
following rule is now derived: 

RuU 2.5. C{n,T) = ,_ r 









Example 2.5. The total number of different bridge hands of 13 cards 

52! 

which can be dealt from a deck of 52 cards is (7(52,13) = ^ 3 ]^!* Again 
the total number of sets of 4 bridge hands is 

52 ^ 

C(52,13) • C(39,13) • C(26,13) • C(13,13) = 

2.3. Stirling’s Approximation. The use of the rules of permutations 
and combinations involves factorials, some with quite large values of n. 
Stirling's formula, 

nl = (1 + rii + 28^^ +•••), 

may be used to obtain quickly an approximation to n ! The first term, 

n! ^ \/2Trn 

gives a suitable approximation in many cases. 

Example 2.6. Evaluate 13! by the use of Stirling's formula, 

log 131 ^ i(log 26 + log tt) + 13 log 13 — 13 log e + log xtl'j 
13!-- 6.2271 X 10^ 

using 5-place logarithm table. 

2.4. Probability and Arrangements. After obtaining the total 
number of mutually exclusive and equally likely ways and those which 
possess attribute A by the use of the rules of permutations and combina¬ 
tions, it is then possible to write the required probability by applying the 
fundamental definition 

_ number of ways that po ssess attribut e A 
^ ” total number of ways 

Example 2.7. A bag contains 4 red and 3 white balls. What is the 
probability of obtaining exactly 3 red balls when 3 balls are drawn? 

^ ^ = 1 . 

^ C(7,3) 35 

2.6. Fundamental Laws of Probability 

Lav) 1. If A and B are two mutually exclusive events {the occurrence of 
one precludes the occurrence of the other), then the probability of either of 
them happening is the sum of the respective probabilities. Symbolically, 
(?{A + B) = (9{A) -f (P(JB). 







Example 2.8. The probability of throwing either a 7 or an 8 with 
two dice is + A = ii- 

Law 2. If A and B are two mdependent events, so that the occurrence of 
one does not affect the chance of the occurrence of the other, the probability 
that both happen is the product of their respective probabilities. Symbolically, 
(?{AB) = (P(A) • (P{B). 

Example 2.9. The probability of getting 2 red balls in drawing 1 ball 
from each of two urns containing 6 red balls and 4 black balls is xV ' to = A- 

If A and B are not independent of one another so that the occurrence 
of one affects the probability of the occurrence of the other, then a defi¬ 
nition is needed for the conditional probability of A given that B has 
occurred, which is denoted (P(/l|5). Similarly the conditional probability 
of B given that A has happened is (P(/?|^d). In such cases fundamental 
law 2 becomes; 

Law 3. (S>{AB) = (P{A) • (?(B\A) = (P(B) • (S>{A\B). If A and B are 

indepe7ide7it, (P(5|A) = (?(B) and (9{A\B) = (P^d). 

Example 2.10. If both balls were drawn in succession from one of the 
urns in Example 2.9 without replacement, then the probability of obtain¬ 
ing 2 red balls is | = -j. 

Law 4. If two events are not mutually exclusive, then the probability of 
at least one of them occurring is fi’(z4 + -6) = (P(^4) + (P(i5) — (9{AB). 

Proof. Let A and B represent the nonoccurrence of A and B, respec¬ 
tively. Then, 

(PiA) + (P(5) = (P(AB) 4- (P(AB) 4- (P(BA) 4- (P(BA) 

= (P{AB) 4- (PiA 4- B). 

(P(A + B) = P{A) 4- (PiB) ~ (P(AB), 

where 6fAB) is the probability of A occurring and B not occurring, and 
similarly for P{BA). 

Law 4 is illustrated in Fig. 2 . 1 , where the outcomes of a chance event 
are represented by points in a plane. Then the outcomes belonging to 
either A or B may be represented by the points in A and B less the points 
common to region AB since AB would be counted twice. 

This law may be extended, for example, 

(P{A -\~ B A" 

= (P{A) 4- P(B) 4- (P(C) - PiAB) - (P{AC) - (P{BC) 4- (P{ABC). 

Example 2.11. In Example 2.9 the probability of obtaining at least 
one red ball is 

(P(A 4- i5) = (P(A) + (P{B) ~ (P{AB) A • A = 
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Law 5. If the probability of an event occurring in a single trial is p, 
the probability of its occurring r times out of n trials is given by 

C(:n,r)p^'(l — py-’’ = C{n,r)p^q^--% 

where I - p = q. This is the (r + l)st term of {q + pY- 

Proof. If the event occurs r times out of n trials, it will fail to occur 
n — r times; hence, the probability of the occurrence of any sequence of r 
successes and n ~ r failures is p^l - But the number of possible 

sequence is given by C{n,r). 

Example 2.12. The probability of obtaining exactly 3 heads on a 
single toss of 5 coins is C(5,3)(Y)Hi)^ = A* 

2.6. A Posteriori Probability. In the previous sections it was assumed 
that the casual system was known a priori; hence, exact probabilities of 



Fig. 2.1. 

various results were readily calculated. In tossing a die it was assumed 
that all 6 faces appeared “equally likely’^ and that a random toss of the 
die was made. In such cases the probability of obtaining any number 
from 1 to 6 was easily seen to be i. With these same assumptions and 
use of the fundamental laws of probability it was also easy to state the 
probability of obtaining a 7 on a single throw, say, with two dice. 

In statistics, however, one is often faced with exactly the reverse of this 
situation. A batch of data resulting from some experiment is at hand, 
and we wish to state the probability that such data could have been pro¬ 
duced by a given casual system. For example, it is noted that two 
hundred 7’s were obtained in tossing two dice 1,000 times. We now wish 
to know the probability that such a result could have been produced with 
unbiased dice. Or we may wish to state, on the basis of these results, the 
expected number of 7’s to be obtained on the next 100 tosses of two dice. 
These problems concern a posteriori probability, probability based on 
previous occurrences. 

A posteriori probabilities, under certain conditions, may be obtained by 
the use of Bayes's formula. Let5i, ^ 2 , .A . , be n mutually exclusive 
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random events, of which one is certain to occur. Let (P{Bi) be the proba¬ 
bility of the occurrence of Bi. Let E be an event which can occur only if 
one of the set of 5’s occurs. Let (9{E\Bi) be the conditional probability 
for E to occur, assuming the occurrence of Bi. We wish to know how the 
probability of B changes with the added information that E has actually 
happened. In other words, we wish the conditional probability (?{Bi\E). 
Using law 3, 

6>{EBi) = (PiBi)(P{E\Bi), 

and 

(P(EBi) = (P(E)(PiBi\E). 


Equating the right-hand sides of these two equations and solving for 
6^(Bi\E), we obtain 


(P{Bi\E) 


(P(Bi)(P(E\Bi) 

(P{E) 


Since E may occur with any of the Bi mutually exclusive events. 


(P{E) = (P(EJ5i) + (?{EB^) + • • • + (S^{EBn) 

= (9{B,)(9{E\B,) + <S>{B^,)(9{E\B,) + • • • + (P{5„)(P(Eli?,). 

Upon substituting this last result in the denominator of the preceding 
equation, we obtain Bayes’s formula, 

(P(Bi)(P(£’|Ci) + <?{B.,)9{E\B^) + • • • + CS>(B„)(P(B|B„)’ 


If we consider the events 5i, B^, . . . , -Bn as hypotheses to account for 
the occurrence of E, then Bayes’s formula provides a means of calculating 
probabilities of hypotheses. In this case (P(Bi), (9{Bf), . . . , (P(Bn) are 
called a priori probabilities of the hypotheses Bi, B 2 , . . . , Bn, and 
(P(Bi|B), (P(B 2 |B), . . . , (P(Bn|B) are called a posteriori probabilities of 
the same hypotheses. 

Example 2.13. Urn I contains 2 white, 1 black, and 3 red balls. 
Urn II contains 3 white, 2 black, and 4 red balls. Urn III contains 4 
white, 3 black, and 2 red balls. One urn is chosen at random, and 2 balls 
are drawn. They happen to be red and black. What is the probability 
that both balls came from urn I? urn II? urn III? 

We identify E as the event that the 2 balls were, respectively, red and 
black. To explain this occurrence, we have three hypotheses: the urn 
was I, II, or III. We identify these hypotheses with Bi, B 2 , B 3 . 

Then CP(Bi) = (P{Bf) = (?{Bf) = i, and (S>{E\Bf) = ^ + f=i. 

Similarly, (P(B|B 2 ) = |, and (9{E\Bz) — i. Substituting in Bayes’s 
formula, 


(P(Bi|B) 


1 I 1 
T T* 7 


+ i 


1? 

53‘ 
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Similarly^ 

__^ 20 

•g- • I + i * f + i ^ 53 

_ i • i __ j;5^ 

i . 1 _1_ 1 . 2 |_ 1 . 1 kq‘ 

ir 3T-'5' 9''r'S‘ s 

Note, that the sum, as we should expect, is 1. 

Example 2.14. (Due to Neyman). Consider the cross of two hybrids 
xi and X 2 which are heterozygous {Aa) and its progeny yi having the 
appearance of a dominant, that is, either doubly dominant {AA) or 
heterozygous (Aa). Suppose that yi is crossed with a recessive (aa) 
designated as 2/2 resulting in n offspring: zi, Z 2 , , Zn. Suppose that 

not one of these offspring is a recessive, that is, either a dominant or 
a hybrid. It is proposed to find the probability that yi is (Aa). 

Let E be the event that n offspring have the appearance of dominants; 
El may be identified with the event that yi is (Aa) and B 2 with the event 
that yi is (AA). Then, = (P{yi = Aa) = and 

(?{E\B,) = (P(E\yi = Aa) = 1/2% 

and (P{E\B 2 ) = (P(-E^l2/i = A A) = 1. Substituting in Bayes’s formula, 

2 . 1 

(?{Bi\E) ~ (^{yi = Aa\E) = ^ 2 n'j fAj- 2”“^* 

Giving n the values 1, 2, 3, 4, 5, we obtain Table 2.1. 

Table 2.1 

A Posteriori Probabilities 

n (P(yi = Aa\E) 

1 .500 

2 .333 

3 .200 

4 .111 

5 .059 

Suppose, however, that we do not know anything about the origin 
of t/i. In that case the a priori probabilities (P(Bi) = (P(t/i = A a) and 
(?{B 2 ) = (9{yi = AA) would be unknown, and we would not have suffi¬ 
cient information to evaluate the right side of Bayes’s formula. In the 
past it has been suggested that since (P(Bi) and (^{BA are unknown, and 
we have no reason to favor one more than another, we should assign one- 
half to each. Modern statistics provides other ways of attacking such 
problems which seem more reasonable. These methods will be discussed 
later. 


(9(B2\E) - 
(9{B,\E) = 





EXERCISES 

2.1. An agronomist is designing an experiment involving the use of 4 
varieties, 3 fertilizers, and 3 spacings. How many different treatment 
combinations, using one from each of the three kinds of treatments, does 
he have? 

2.2. In how many different ways may a Jersey or a Holstein be 
drawn from a mixed herd of 5 Jerseys, 7 Holsteins, 10 Guernseys, and 6 
Brahmans ? 

2.3. How many different ways may a horticulturist arrange 5 different 
potted plants along a line on a greenhouse bench? 

2.4. How many different ways may a student select a major and a 
minor from 5 possible fields? 

2.6. How many different arrangements can be made using the 10 letters 
from the word ^^statistics’’? 

2.6. How many signals can be made by hoisting 6 flags of different 
colors if there are 6 significant positions on the flagpole? Any number of 
the flags may be hoisted at a time. 

2.7. An organism has the possibility of having 1, 2, 3, 4, or 5 out of a 
total of 15 characters. What are the total possible combinations? 

2.8. An industrial engineer is designing an experiment arranged to 
measure sources of variation from 4 factors (runs, journeys, cylinders, and 
pots). If we let R, J, C, and P represent the respective factors, how many 
2 -factor interactions of the type RJ, etc., are there? How many 3-factor 
interactions? How many 4-factor interactions? 

2.9. Using the relationship 

(1 -f- x')'^ 

= 1 -f C{n,l)x + C{n,2)x^ -}- * • • C{n, n — l).'r"“^ fl- (7(n,n)x“, 
show that 

2 « - 1 - C{n,l) + C(n,2) + • • • + C(n, n - 1) + C(n,n). 

How many ways can we make a selection of 5 breeds of chickens, taking 
some or all? 

2.10. Show that C(n,r) = C(n, n — r). 

2.11. If C(n,10) = C(n,6), find C(n,3). 

2.12. If C(16,r) = C(16, r - 2), find r. 

2.13. If P(56, r + 6)/P(54, r + 3) = 30,800, find r. 

2.14. A random sample of size n from a finite population of N sampling 
units is one in which every possible combination of size n has an equal 
chance of being chosen. How many different samples of size 10 may be 
drawn from a list of 100 names? Use Stirling’s approximation to evaluate 
the factorials. 
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2.16. Suppose that in selecting the sample of size 10 in Exercise 2.14 
we draw a number from 1 to 10 at random, say 6, and select every tenth 
name on our list thereafter, that is, 16, 26, etc. Is this method of selec¬ 
tion equivalent to random sampling? Why? 

2.16. From a pack of 52 cards, 2 are drawn at random; find the proba¬ 
bility that one is a queen and the other a king. 

2.17. There are three events A, B, C, one of which must, and only one 
can, happen; the probability of A not happening is and the probability 
of B not happening is y. Find the probability of C happening. 

2.18. The probability of A solving a certain problem is f, and the prob¬ 
ability of B solving the same problem is What is the probability that 
the problem will be solved if both try? 

2.19. In a family with 6 children, what is the probability that (a) all 
children will be girls; (5) all children will be of the same sex; (c) the first 5 
children will be boys and the sixth a girl? (d) That 3 of the children will 
be boys? Assume the sex ratio is y. 

2.20. Show that under the conditions of law 5 the probability that an 
event happens at least r times in n trials is 

C{n,r)p^q^-^ -f C(n, r -f ^ + • • • + (7(n, n ~ + p” 

or the sum of the last (n — r + 1) terms of the expansion of (q -f- p)”. 

2.21. A lady declares that by tasting a cup of tea made with milk she 
can discriminate whether the milk or the tea infusion was first added to 
the cup. Eight cups of tea were mixed, 4 in one way and 4 in the other, 
and the lady was so informed. The cups were then presented, in random 
order, to her for judgment. She was asked to divide the 8 cups into two 
sets of 4, agreeing, if possible, with the treatments received. The lady 
selected 3 right and 1 wrong in each set of the same treatment. On the 
assumption that the lady cannot discriminate between the two methods, 
show that the probability of her doing as well or better by chance is y^. 

2.22. An urn contains 6 balls which are known to be all red or 4 red and 
2 black. A ball is drawn and found to be red. What is the probability 
that all the balls are red? 

2.23. A male rat is either doubly dominant {AA) or heterozygous (Aa), 
and, owing to Mendelian properties, the probabilities of either being true 
is y. The male rat is bred to a doubly recessive (aa) female. If the male 
rat is doubly dominant, the offspring will exhibit the dominant character¬ 
istic; if heterozygous, the offspring will exhibit the dominant character¬ 
istic i of the time and the recessive characteristic y of the time. Suppose 
all of 3 offspring exhibit the dominant characteristic, what is the prob¬ 
ability that the male is doubly dominant? 

2.24. Chevalier de Mere’s problem was concerned with a certain game 
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of dice. Twenty-four throws of a pair of dice is to be allowed, and the 
player is permitted to bet even money either on the occurrence or at least 
one ‘^double six^^ in the course of the 24 throws or against it. Certain 
theoretical considerations led De Mere to believe that betting on the 
double six is advantageous. On the other hand his empirical trials 
appeared to contradict this conclusion. Pascal’s solution stated that, if 
the dice are “fair,” the probability of obtaining at least one double six 
in 24 throws is .491. Check Pascal’s results. 
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CHAPTER 3 

UNIVARIATE PARENT POPULATION DISTRIBUTIONS 


3,1. Specification. In Chap. 2, we were interested in obtaining the 
probability of the occurrence of a single chance event. It will be recalled 
from Chap. 1 that in order to arrive at the end product, a test of a hypo¬ 
thesis in the representative statistical investigation, we needed the proba¬ 
bilities for a complete set of chance events. In that investigation the 
chance events were various sums of ranks, S. A table of the possible 
values which a chance event may assume with a corresponding probability 
for each value is called a prohability distribution for the parent population. 
This distribution is given in Table 3.1 for the parent population of values 
of S. In this table, the variable .t is used to represent the chance event S 
and in such cases is called a chance variable or variate. 


Table 3.1 

Probability Distribution of Sum of Ranks 


X 

10 11 

12 

13 

14 

15 16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

ix) 

1 1 
i TTT 

2 

TT> 

3 

TTT 

.5 

Y-Q 

tV 

tV 

Yo 

7 

To 

7 

7 0 

.5 

To 

TO 

3 

To 


1 

Tcr 

1 

Tff 


Ordinarily, in applied statistics, specification is accomplished by select¬ 
ing a mathematical function, for example, the normal, binomial, or 
Poisson, on the basis of theoretical or empirical evidence and stating that 
the observations form a sample of all possible values of the variate. 
Quoting from R. A. FisherP “ . . . we may know by experience what 
forms are likely to be suitable, and the adequacy of our choice may be 
tested a posteriori. We must confine ourselves to those forms which we 
know how^ to handle, or for which any tables which may be necessary have 
been constructed. 

3.2. Discrete Distributions. Functions like f{x) in Table 3.1 are 
called discrete probability distribution functions to distinguish distribu¬ 
tions of this type from continuous probability distribution functions, to 
be discussed later. The various values of f{x) may be thought of as 
giving the relative frequencies of occurrence corresponding to the particular 
values of x. Since some one of the 17 events must occur on any one trial, 
the sum of all the probabilities is 1 or symbolically 
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The distinguishing characteristic of the discrete distribution is that tlie 
variate x can take only isolated values, that is, in Table 3.1 only the whole 
numbers 10 through 26. 

3.3. The Binomial Distribution. This discrete distribution is the dis¬ 
tribution of successes, x, in n repeated independent trials, in which the 
probability of success on any trial is a constant p. It has been named the 
binomial distribution because the successive probabilities are given by the 
respective terms of the expansion of the binomial (q + p)”, where 
q ~ 1 — p. One property of the binomial theorem is that the {x + l)st 
term of the expansion is 

f{x) — C{n,x)p%'^~''‘, 0 < X < n, (2) 

which by the methods of Chap. 2 also gives the probability of exactly x 
successes in n trials. On the right side of (2), x is the variate, and p 



Fig. 3.1. Graph of probabilities of obtaining x tenant farms, [/(a?) axis given in 


parts of 243.] 


and n are parameters, that is, for any particular member of this family 
of distributions special values of p and n must be specified. Since 


^ /(*) = (? + P)” = 1; 


this distribution fulfills the requirement that the sum of the probabilities 
is 1. 

Individual terms and partial sums for various numerical values of p, 
n, and x for the binomial distribution are given in reference 2. 

Example 3.1. Given that the probability of drawing a tenant farm 
in a sample of farms is i. If samples of 5 farms are drawn, then the 
respective probabilities of obtaining 0, 1, 2, 3, 4, 5 tenant farms in a 
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single sample are (|)*; 5a)(f)‘; 10(1)^!)’; lO(i)Hi)*; 5^)^!); HY or 
[32,80,80,40,10,1], 

The probabilities/(x) of obtaining {x = 0,1,2,3,4,5) tenant farms may 
be shown graphically as in Fig. 3.1. 

If the probabilities are accumulated and graphed as in Fig. 3.2, then 
some F{a) value gives the probability of obtaining a value of x less than 
or equal to a, that is, F(a) = (P(a: < a). This step function is called a 
cumulative distribution or an ogive. 

Note that points of discontinuity occur for each whole number on the 
X axis and that F(5) = (Pix < 5) = 1. 


1 2 3 4 5 

Fig. 3.2. Cumulative distribution of tenant-farm probabilities. [F(x) axis given in 
parts of 243.] 

3.4, The Poisson Distribution. Another discrete distribution of 
importance in applied statistics is the Poisson distribution. The Poisson 
distribution may be derived as a limiting form of the binomial distribu¬ 
tion when p is very small but n is so large that np is a finite constant, equal 
to m, say. To see this, consider the binomial distribution: 

= n(n - l)(n - 2) — • (n - a; + 1 ) 


fix) = ^ 

Since p == m/n, 

(1 - l/w)(l - 2/n) 


(n — X 1) fm 


• [1 - jx - \)/u]m^il - m/ny 
x\ 


lim fix) = 


Then 





Obviously, this function is also a function of x; hence the Poisson distribu¬ 
tion may be written as 

/yyjX 

f{x) = 6"'”'—t; 0 < a: < 00. 

xi 

Since 


00 



2 /(*) = 1 - 

x = 0 

The distribution was named for Poisson, having been given first by him 
in 1837. Individual terms and partial sums for various numerical values 
of m and x for the Poisson distribution have been made available by 
Molina.^ 

It should be noted that the Poisson distribution is a one-parameter 
family, m being the parameter. 

Example 3.2. A bag of clover seed is known to contain 1 per cent 
weed seeds. A sample of 100 seeds is drawn. Since 

m — up = lOO(.Ol) = 1 

and e~^ = .3679, the probabilities of 0, 1, 2, 3, ... , weed seeds being 
in the sample are 


Number of weed seeds 

0 

1 

2 

3 

4 

5 

6 

7 

Probability 

.3679 

.3679 . 

1839 

.0613 

.0153 

.0031 

.0005 

.0001 

3.6. Continuous 

Distributions. 

If 

measurements 

instead of 

counts 


constitute the data under consideration, then the hypothetical parent 
population distribution is usually that of a continuous variate instead of 
a discrete variate. Snedecor^ gives the histogram of Fig. 3.3 for the gains 
in weight of 100 swine. Before powerful mathematical methods may 
be applied to derive a methodology providing techniques for statistical 
inferences, it is desirable to ^ idealizethe histogram into a curve which 
may be represented by a mathematical function. Such a process takes 
place in other branches of applied mathematics, for example, in survey¬ 
ing. Before the surveyor can be furnished with a powerful methodology 
for the solution of his practical problems in mensuration, it is necessary 
for the geometer to idealize the physical points, lines, and planes. A 
geometrical point is defined as having no dimensions but simply an 
indicator of position. Again, a geometrical line has no width, and a 
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geometrical plane has no thickness. With these idealized definitions 
and certain assumptions called axioms the geometer is able to prove 
theorems concerning relationship and properties of geometrical configura¬ 
tions, These theorems in turn form the bases for a practical surveying 
methodology. 

How shall we idealize a histogram of the type given in Fig. 3.3? First 
of all, instead of a finite population of possible values of the variate, we 
assume an infinite population of gains in weight. Next, instead of the 
class marks differing by 5 pounds, suppose that they are selected closer 
and closer together. It is not difficult to see that the histogram might 
reasonably be expected to approach some continuous smooth curve of 
the type shown in Fig. 3.4. 



Fig. 3.3. Histrograni showing frequency of gains in weight of 100 swine. 

In Fig. 3.1 the probability of obtaining some particular x for the dis¬ 
crete distribution was represented by an ordinate. Here, x represents 
only whole numbers, but, because of the limitations imposed by measur¬ 
ing devices, the best that can be said concerning the ^Hrue'^ value of 
some observed x value where x represents a continuous variate is that it 
lies in some interval {x, x + dx). If the area under the continuous curve 
be made equal to 1, corresponding to the similar requirement that the 
sum of the probabilities equal 1 for the discrete distributions, then the 
probability of x lying in the interval {x, x -f dx) will be f{x) dx. This 
would be the theoretical probability corresponding to the empirical proba¬ 
bility, say, of a gain of weight lying between 37.5, and 42.5 that is, 
= *13. 

The range of x may be thought of as extending from — oo to + o©, even 
through the curve may actually contact the x axis at some finite value, 
since the area under the curve in the contact interval would be zero. As 
was pointed out, the total probability or area under the curve is 
symbolically 

fix) dx = 1 . 


( 3 ) 


The probability of x being equal to or less than some constant a is 
expressed as 

(9{x < a) = f{x) dx. (4) 

Again, the probability of x lying between a and h is given by 

(P(a < X < b) - fi^) (5) 

It is possible to omit the equal signs in the left sides of (4) and (5) since 
the probability of obtaining any particular value of x is equal to the width 
of a geometrical line which is zero. 



X x^dx 

Fig. 3.4. “Idealized,” or theoretical, probability distribution. 


F(x) 



O a b 


Fig. 3.5. Cumulative probability distribution for a continuous variate. 

The cumulative probability distribution, or ogive, for the continuous 
variate corresponding to the probability distribution of Fig. 3.4 would 
appear as in Fig. 3.5. For this curve, F{a) — (?(x < a), and hence 
F{b) - F{a) = 6>{a<x < b). 

' 3.6. The Normal Distribution. The most important continuous dis¬ 
tribution in applied statistics is the normal distribution. The histogram 
of Fig. 3.3 and the theoretical distribution of Fig. 3.4 are those for data 
specified as being normally distributed. Data arising from many differ¬ 
ent measurements taken on plants and animals are specified as following 
the normal distribution. There is empirical justification for this assump- 



m 
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tion. Similarly, distributions of certain data of the physical and social 
sciences are found to be satisfactorily represented by the normal distribu¬ 
tion. It should not be assumed, however, that every continuous distribu¬ 
tion representing actual data should be normal. For example, it is known 
that the distribution of sizes of cumulus clouds should be represented by a 
U-shaped curve. The mathematical form of the normal curve is defined 
by 

J{x) dx = —i— dx, ~ ^ < x < ^ . (6) 

0- V27r 

The curve is symmetrical about x — ^ and bell-shaped as in Fig. 3.4. The 
inflection points are dA x = + (r, and the tails of the curve, although 

approaching the x axis quite rapidly, extend indefinitely far in both 
directions. The function (6) represents a two-parameter family, m and (t, 
of continuous distributions, that is, as m and cr vary in magnitude a family 
of distributions is generated. 

Since it can be shown for (6) that 

/_”oo 

then the normal probability distribution has this same property in com¬ 
mon with the binomial and the Poisson distributions. 

Table I in the Appendix gives ordinates and areas for the normal 
distribution. 

3.7. Probability Distributions as Specialized Mathematical Functions. 

We have noticed that theoretical probability distributions are mathe¬ 
matical functions possessing certain requirements. In order to give a 
complete formal definition of the requirements necessary for a mathe¬ 
matical function to be a probability distribution of statistics, it is con¬ 
venient and sufficient to consider the cwnulcitive distribution function, 
F{x). It is sufficient since, given the cumulative distribution, it is possi¬ 
ble to find the distribution itself by taking the differential, that is, 

d[F{x)] = /Or) dx. 

A mathematical function, F{x), may be used as a cumulative distribu¬ 
tion of a chance variable provided that 

(a) F(—oo) = 0, F(-foo) = 1, ^ 

(&) fIx) is a nondecreasing function, that is, if a:i > X 2 , F{xi) > F(x 2 ), 
(c) F{x) is defined at every point in a continuous range and is con¬ 
tinuous, except possibly at a denumerable number of points. 

The following notation should be kept clearly in mind: 
f{x) is the frequency function. 

F(x) is the cumulative distribution. 
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3.8. Some Mathematical Functions Useful as Probability Distributions. 

Karl Pearson^'® has suggested the differential equation 

dy _ {d ~ x)y 
dx a A- hx -{■ cx^ 

as a generator of possible parent population distributions useful in applied 
statistics. For example, if J = c = 0, and a — o-^, then the differ¬ 

ential equation becomes 

dy _ {y — x) dx 

y ~~ 

Solving for y, we have 

, (iu — , 

loge y - - 7r2 - + 

Then 

y = (a:—A«) V2(r ^ 

Upon setting the integral between the limits from — oo to co equal to 1, 
we find 


1 



Hence, 

y — - -_ 00 X < 00 

(T V27r 

which is the normal frequency function. This is Type VII of the Pearson 
system of frequency functions.® 

Another method of obtaining a mathematical representation of a fre¬ 
quency function is furnished by the Gram-Charlier series.® This latter 
method will not be discussed here, but the interested reader should con¬ 
sult the reference. 

3.9. The Gamma and Beta Functions. These are two useful functions 
in statistics of which extensive use will be made in subsequent chapters. 
The Gamma function of the positive number n is defined by 

r(^^) = Jq dx, n > 0. 

The properties of the Gamma function will be exhibited in Exercises 

3.1 to 3.6. 

The incomplete Gamma function, defined by 
F(x) = h(n) = ^ dx, 

r(n) ’ 


0 < x < CO, 



(J Jy 1 K Aiti/l-i JOj 1 ^ 


furnishes a useful cumulative distribution function which will be discussed 
in Chap. 7. 

The Beta function, defined by 


B(m,n) = 1^' m, n > 0, 

is also of importance in theoretical and applied statistics The properties 
of the Beta function will be exhibited m Exercises 3.7 to 3.9. 

The cumulative incomplete Beta function is defined y 


F{x) = Ixim.n) = 


1 


B(m,n) Jo 

Bx{m,n) 

B(m,?i) 


dx, 


0 < a: < 1, 


Both hin) and h{m,n) have been tabulated by Karl Pearson and his 
staff at the Biometric Laboratory, University College, London. 


EXERCISES 

3.1. U« by p.rto K. «»* ^ f ^ 

3.2. Show that r(n + 1) = n(n - 1) • • • (n /iji 

a positive integer less than n. , == 

3 . 3 . If n is also a positive integer, show that T{n + 1) 

Tin) becomes infinite when n — 0. 

3.5. Show that r(a) = 2 // dy by setting x = m t e 

integral defining the Gamma function 

3.6. Using the result of Exercise 3.5, show that. 

r ^ 


(а) rtt) = 2 C 

(б) [r(i)]^ = Co7o“ 

i* fct r ^ 


S. By sltti^'^r = siiU 0 in the integral defining B(m,n), show that 


B(to,«) = 2 C" sin*”*-' 0 cos^" ‘ e d0. 

3.8. By setting x = 1 - y in the integral defining B(m,n) show that 
B(m,n) = B(n,m). 











3.9. Show that 

(a) r(rOr(m) = 4 ^2n~ly2m-l^-(x^+yn fly^ 

(b) r(n)r(m) = 4 ^ ^ jo °° (ir, 

(c) r(n)r(m) = 'B{m,n)T{m + n), or 


'B{m,n) 


T(7n)T(n) 
T{m + n) 


3.10. Construct a parent population probability distribution for the 
sum of numbers appearing when two dice are tossed. Give (a) a table of 
X and/(rr), (b) a graph of (a), and (c) the cumulative distribution graph. 

3.11. If you have a set of random numbers from 0 to 999, how would 
you set up a sampling scheme to select a random sample of 50 from 490 
farms, the farms being numbered 0 to 489? 

3.12. Follow the instructions in Exercise 3.10 for the binomial distribu¬ 
tion with p = I, n = 5. 

3.13. Follow the instructions in Exercise 3.10 for the Poisson distribu¬ 
tion with p = . 002 , n = 1 , 000 . 

3.14. Given the frequency function /(.t) = 2x, 0 < x < I, f{x) = 0 for 

a: < 0 or .T > 1 . Show that ~ h What is the functional 

form of the cumulative distribution function? 

3.15. Show that the area under the curve is 1 for the triangular distri¬ 
bution function 

fix) = 0 <x<b. 


3.16. Repeat Exercise 3.15 for the rectangular distribution 
f(x) = —? 0 < X < w. 


3.17. Repeat Exercise 3.15 for the Cauchy distribution 

3.18. A random variable x, which lies between the limits 0 and 10, has 
the frequency function f(x) = Ax^. Determine the value of A so that 
the total probability is 1. What is the probability that x lies between 2 
and 5? That x is less than 3? 

3.19. A random variable follows the normal distribution. Determine 
the coordinates of the maximum point on the frequency curve, and show 
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that fx ± a are the inflexion points. Show that the area under the normal 
curve is 1. 

3.20. A variate has the distribution/(x) = in the interval 0 < < co . 

The probability is ^ that x will exceed what value? 


3.21. If (?{x < rt'i) = 1 - 


1 

1 + .vf 


X being a continuous variate with 


range 0 < a: < find the frequency function/(:r). 

3.22. May/(.r) = — 1 /Or — 2 )^, 0 < a; < 4, serve as a probability dis¬ 
tribution function? Why? 

3.23. Use the table of areas of the normal curve to determine, for the 
normal distribution given in Exercise 3.19, the probability of (a) x ‘> ii, 
(b) {fx — a) < X < (iJL a), (c) x > ix 2a. 

3.24. Give the Poisson approximation to the binomial distribution with 
n — 2,048 and p = 1/1,024. Hence, obtain the probabilities of there 
being 0, 1, 2, 3, . . . times that 10 tails appear in 2,048 tosses of 10 coins. 

3.26. It can be shown for large n that the binomial distribution may be 
approximated by the normal distribution with /x = np and a'^ = 7ipq. If 
20 coins are tossed, obtain the approximate probability of obtaining 8 or 
more heads. 

3.26. For the distribution f(x) =■ 2x, 0 < x < 1, find the number a 
such that the probability of a: > a is 3 times the probability of :r < a. 

3.27. If two values of x are drawn at random from the distribution 

f{x) = 0 < X < QOj what is the probability that both are greater 

than 1 ? 

3.28. In Exercise 3.27 what is the probability that at least one value of 
X is greater than I ? 
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CHAPTER 4 

PROPERTIES OF UNIVARIATE DISTRIBUTION FUNCTIONS 


4.1. Introduction. In this chapter certain important properties of 
parent population distributions will be discussed. The discussion will also 
apply to similar properties of derived sampling distributions. These 
properties will be found useful in describing parent population distribu¬ 
tions and derived sampling distributions. 

4.2. Mathematical Expectation. The mathematical expectation of 

any random variable x, which can assume values X\, , Xn with 

n 

probabilities, pi, p 2 , . . . , Pn, respectively, where ^ Pi = 1, is defined 

1 

to be 

n 

E (x) ^ XiPi* 

i = l 

For the discrete distribution this becomes 

n 

E{x) = ^ XiJ{x^), 

for distributions with pi = f{x^. For the continuous distribution 

E{x) = j_ ^ xf{x) dx, 

where as noted before/(x) may vanish over part of the range ( — «>, + <»). 

The term ^'mathematical expectation’’ may be shortened to “expected 
value” or “average value,” 

The definition of the expected value of any random variable x may be 
generalized to include functions of x. The expected value of any func¬ 
tion of x, say B{x), is 

E[e{x)] = 2e{x)f{x) or ^e{x)f{x) dx, 

over the range of x and depending upon whether rc is a discrete or con¬ 
tinuous variate. 

It is possible, by introducing a generalized form of summation called a 
“Stieltje’s integral,” to replace the 2 or / by a single integral sign. 
Although the concept of the Stieltje’s integral simplifies statements con- 

31 
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cerning probabilities and expected values, the mathematical concepts 
behind this refinement are beyond the scope of this text. A distinction 
in all statements will be made between the discrete and continuous 
distributions. 

If the indicated integration, necessary to obtain the expected value, 
cannot be performed directly, various methods of numerical integration 
are available. 

Example 4.1. The expected value of x, where x gives the various sums 
possible to be made by throwing two ordinary dice, may be found from 
the following table: 


i 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

Xi 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

36/Cri) 

1 

2 

3 

4 

5 

6 

5 

4 

3 

2 

1 


Then, 


11 


E(x) = ^ Xifixi) 


2 + 6 + 12 + 


+ 12 


1 = 1 


36 


= 7. 


Example 4.2. If/(.r) = 3n;^, 0 < .r < 1, then 


Also, 


E(x) = X ■ 3x^ dx = |- 
Eix^) = ■ 3x^ dx = 


4.3. Operations with Expected Values. The rules stated below will 
be found useful in operating with expected values. 

1. The expected value of a constant is the constant itself. E{c) = c. 

2. The expected value of a constant times a variable is the constant 
times the expected value of the variable. E[cB{x)] = cE[B{x)]. 

3. The expected value of a sum (or difference) of two variates or func¬ 
tions is the sum (or difference) of the expected values of the separate parts. 
E[B\{x) ± ^ 2 (^ 0 ] — E[Bi(x)] ± A'[^ 2 Cr)]. 

The proof of these statements is left as an exercise for the student. 

4.4. Moments. The expected value of x^ is called the kth moment of x 
about the origin and is represented by the symbol /xj.. Hence, 

jUft = E(x’^) = 2.r^/(.r) or }xH(x) dx, 

over the range of x. The first moment of x about the origin is referred 
to as the mean of x, and is denoted by /xj. To simplify writing let /x^ = g. 
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The expected value of {x — iiY is called the /cth moment of x about the 
mean and is designated by the symbol Hence 

= E{x - ixY = - iJ^mx) or Six - ix^fix) dx, 

over the range of a;. It is easily seen that 

ixi = Eix — m ) = 0 . 

The second moment about the mean, 1 x 2 , is called the variance and is 
usually designated by the symbol crl Hence, 

^2 = ^2 = Eix - iu)2 =: 2(a^ - or J(rr - dx, 

over the range of x. The formula for 0 -^ may be written as follows: 

0-2 =: Eix - ixY = EixY - 2fiEix) + (m)“ = Eix‘^) - ifiY = M 2 - (m)^. 

The spuare root of cr^, or (t, is referred to as the standard deviation. 

The third moment about the mean, ms, furnishes a measure of skewness, 
or departure from symmetry about the mean of the distribution. One of 
the most generally accepted measures of skewness is the nondimensional 
quantity 



It is seen that as will be zero for a symmetric distribution. 

A measure of the relative flatness or pcakedness of the distribution, 
called the kurtosis, is given by the nondimensional quantity 


(X4 == 




M 


Example 4.3. Using the table of values of Example 4.1, we see that 
= 7. Also, 


, 4 + 18 + 48 + • • • + 144 _ 1, 974 

M2-30 - ’ ■ 36 


Hence, 


1,974 

36 


- 49 


6* 


Example 4.4. Using the distribution function of Example 4.2, that is, 
fix) = 0 < .T < 1, 


_ 3 
— 


T'5' — roj 


• 3.t“ dx 


1 

160* 


and 






Example 4.6. In order that the function, 


f{x) dx = dx, — < x < oo, 

represent a probability distribution, it is necessary that 

/(^) = 1 - 

Hence, we find k = Also, 

n = j ^ xf(x) dx = m, and (x — jjl)J{x) dx = 


Substituting these values in the original function, we find 

f{x) dx = -_ 00 < a; < OC, 

a '\/2 t 

which is the normal distribution. 

Example 4.6. For the binomial distribution, 




and designate m(t) the moment generating function of x. If m{t) be differ¬ 
entiated k times with respect to t and then evaluated at ^ = 0, we note 


rUrULi^iiuiy rsxurnjjxx 



m{t) = 2 e^-'^C{n,x)p^q^-^ == ^ C(n,a?)(peO V® = (g + peO^ 

® = 0 X == 0 

Then, 

ju' = ;j = npe‘(g + pe‘)““‘ = np, 

n'^ = np[e'(n -!)(? + pe‘)"“'P«‘ + (3 + pe')”"'*'] = np[{n - l)p + 1]. 

Hence, 

^2 = 0-2 = - {^y = np(l - p) = npq. 

We may obtain the following relation between M(t) and 7n{ty, 

M{t) = = e-"‘[£l(e‘’’)] 

.'. M(<) = e~‘'‘ni{t). 
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Besides furnishing a simple method of obtaining the moments in certain 
cases, the moment generating function is of use in deriving distribution 
functions and in comparing distributions. These latter two uses follow 
from Theorems 4.1 and 4.2 of Sec. 4.7. 

The expected value of may not exist for real values of i? for many 
discrete and continuous distributions, for example, for the Cauchy distri¬ 
bution 

f{x) dx — ~ < X < 00 . 

A more general function, which can be proved always to exist, is the 
characteristic function defined as where t is real. The character¬ 
istic function for the Cauchy distribution is However, the evalua¬ 

tion of the integral, necessary to obtain the characteristic function, makes 
use of more advanced mathematical methods than are assumed for this 
course. Our uses of Theorems 4.1 and 4.2 will be confined to the moment 
generating function, which will be assumed to exist in such cases. 

4.6, Cumulants. Suppose that we define 

log m{i) = K(^) = /Cif + /C 2 + * * * + ^ + * * * . ( 1 ) 

But 

log m{t) — tix log AI{i) 

and 


Hence 


log Mif) = log 1+2 

i = 1 

log unif) = + log 1 + (^2 


M 2 I 1 ^ 0 ( 


log (1 + :r) = a; — 


( ^2 ^3 

i«2 + M3 ^ + 


■)-K 


M2 yj i- M3 g-| -T 



= M^ + M2 ^ + M3 g-j + (m4 — 3. ■) ‘ • (2) 

Hence, equating coeflicients of like powers of t in ( 1 ) and ( 2 ), we find 
/Cl = M; /C 2 = M 2 , /C 3 /C 4 = Ml ~ ^tc. Tlic fuiictiori K{t) is called 

the cumulant generating f unction. 
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Example 4.9. For the normal distribution 


1 


f(x) = - ^7=1 g~(:K-M)V2a2 — CO < o; < 00, 

(j '\/27r 


a '\/27r 

= TUT. /- 

Let a; = M 4- 2/, then 


a s/2 

qIh 


m{t) = -e<2/~(?yV2a2) 

~ f <5-(y-i<^^)V2(r2 gtVV2 


(y \/2Tr 
<r •\/27r 


Now, in order to read off the moments, we need the expansion of m{t) in 
series, but this is not very simple. However, the cumiilants may be 
found quite simply, since for this case 


But 


K(0 = log m(t) = t^JL + 


2 ' 


12 fZ 

K(Q = Kit + K2 4 K^ 4 


Hence, for the normal distribution 

Ki = /X, K2 = Ki = 0 for i > 2. 

It is important to remember that all cumulants after and including i:z 
for the normal distribution are zero. 

Hence, for the normal distribution, it is simpler to read the cumulants, 
Ki, for the cumulant generating function, K(^). If the moments are 
desired, they may be obtained easily from the cumulants. Hence the 
use of either the moment generating function or the cumulant generating 
function depends on the form of the distribution function. 

4.7. An Inverse Problem. It was seen that, if we are given the 
theoretical distribution, then we may obtain a set of moments (ju, g's? 
gg, . . . ). In applied work we may have a large sample of observations 
and wish to determine from the data some evidence regarding an appro¬ 
priate theoretical function to represent the assumed parent population 
distribution. 

From the table of values giving the empirical frequency distribution it 
would be possible to obtain sample moments for the large sample. If it 
be assumed that these sample moments are “good’’ estimates of the cor¬ 
responding moments of some theoretical distribution, then we have the^ 
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inverse problem of determining uniquely a theoretical distribution having 
given the moments. This problem is discussed in texts on advanced 
mathematical statistics and is beyond the scope of this course. The 
'‘Pearson system of curves’’ is an assumed set of continuous functions 
whose parameters may be expressed in terms of the moments. Hence, 
estimates of these parameters may be obtained from a large sample and 
some one of the theoretical curves “fitted” to the empirical frequency 
distribution. A test of “goodness of fit” may then be accomplished by 
the use of chi-square (see Chap. 12). 

Closely related to the moment problem mentioned above is the inverse 
relation of the characteristic function or the moment generating function 
to a possible corresponding distribution function. Two theorems from 
advanced theoretical statistics will be stated without proof and made use 
of in subsequent derivations. 

Theorem 4.1. A distribution function is uniquely determined by its 
characteristic function or, where it exists, the moment generating function. 

Theorem 4.2. If a distribution function has a characteristic function 
{moment generating function) which approaches the characteristic function 
{moment generating function) of another distribution, the two distributions 
approach each other. 


EXERCISES 

4.1. Find the mathematical expectation for the following distribution: 


y 

10 

20 

30 

40 

V 

.1 

.3 

.5 

.1 


4.2. Given the following probability laws, find g, <t^ for each: 

{a) f{x) — lOa:^ 0 < a: < 1, 

(6) f{x) = x/60, 0 <x < 10. 

4.3. A random variable can assume only two values 1 and 2. Its 
mathematical expectation is f. Find pi and p 2 . 

4.4. A random variable has the distribution function f{x) — A -f- Bx, 
0 < X < 1. The mathematical expectation is i. Find the constants 
A and B. 

4.5. Express gz and g^ in terms of moments about zero. 

4.6. Use the cumulants for the normal distribution to determine the 
first four moments about the mean. 

4.7. Two other measures of skewness and kurtosis, or departure from 
normality, are ji = kz/{k 2 )^' and 72 = ka/kI. Show that 71 = az and 
72 = 0:4 — 3 . Find 71 and 72 for the normal distribution, 
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4.8. Determine the first four cumulants for the binomial distribution. 
Verify that k^+i = pq(dKr/dp), r > 1, for the cumulants obtained. 

4.9. Show that E(x ~ .ri)^ is a minimum if .ri = E(x). 

4.10. If X has the distribution function/(.r) = 4 on the interval (0,2), 
find the moment generating function of x, and determine the variance 
of X. This is called a rectangular distribution. 

4.11. An unbiased penny is tossed 64 times. Find (a) the expected 
number of heads, (6) the theoretical standard deviation. 

4.12. A pair of dice is thrown 60 times. Find (a) the expected number 
of times that the sum 10 appears, (6) the expected value of the square of 
the standard deviation. 

4.13. There are 6 urns containing, respectively, 1 white, 9 black; 

2 white, 8 black; 3 white, 7 black; 4 white, 6 black; 5 white, 5 black; 
4 white, 6 black balls. One ball is to be drawn from each urn. What 
is the expected number of white balls taken? Let Xi be a variable which 
assumes the values 1 or 0 according as to whether the trial results in 
success or failure. Then, m = xi x 2 -i- • * * + aJn is the number of 
successes in n trials. But E{xi) = pi ' 1 (1 — pi) • 0 — pi for i — 1, 

2, . . . , n. Hence E(m) = pi + p 2 + • * * + Pn. 

4.14. An urn contains a red balls and b black balls, and c balls are drawn 
simultaneously. What is the expected number of red balls drawn? 

4.15. An urn contains r tickets numbered from 1 to r, and s tickets are 
drawn at a time. What is the expected sum of the numbers on the tickets 
drawn? Let Xi be the variable attached to the tth ticket which may 
assume any of the values 1, 2, . . . , r. Then, 

= 0) (1 + 2 + • • • + r). 

Set m ~ Xi X 2 ' ' ’ + ocs, and complete the solution by finding 
E{m). 

4.16. Find the moment generating function m{t) for the triangular 
distribution sketched in the accompanying figure. 


fix) 



4.17. Find the moment generating function m{t) for the rectangular 
distribution f{x) = l/t^, 0 < a; < ic. Verify that when w = \, the 
square of the moment generating function of the rectangular distribution 
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equals the moment generating function of the triangular distribution in 
Exercise 4.16. 

4.18. Find /Z2! Ms? M4 foi’ the binomial distribution using the formulas 
for the definitions of these moments. Note: = x(x — 1) + x, 

x^ — x(x — 1)(t — 2) + — 2x, 

and x"^ — x(x — l)(x — 2)(x — 3) + (jx^ — llx^ + Qx. 

4.19. Find and a 4 for the binomial distribution. 

4.20. Find ii, as, and for the following binomials: + f)^ 

(i + 1)^^- 

4.21. Show that the moment generating function for the Poisson dis¬ 
tribution is m{t) — 

4.22. Show that Ki = m{i = 1, 2, 3, . . .) for the Poisson distribution. 

4.23. Find m? mL M 4 for the Poisson distribution, {a) using the for¬ 
mulas for the definitions of the moments, (6) using the moment generating 
function. 

4.24. Find 1 x 2 , as, and a 4 for the Poisson distribution. 

4.25. Prove the general formula connecting the moments about ^ with 
the moments about the origin: 

/ f \ f) 2 / . . 

H- 2 I- ^k-2 — • • * . 

Use the formula to obtain 

Ml = 0, 


CHAPTER 5 


BIVARIATE AND MULTIVARIATE DISTRIBUTIONS AND THEIR 

PROPERTIES 

5.1. Introduction. In the previous chapters single-variate, or uni¬ 
variate, distributions and their properties have been discussed. It is 
proposed now to extend the discussion to cases of two or more variates, 
that is, bivariate and multivariate distributions. The discussion will apply 
alike to bivariate and multivariate parent population distributions and 
bivariate and multivariate derived sampling distributions. The latter 
distributions will be discussed in Chap. 6. 

5.2. Discrete Bivariate Distributions. Suppose that for every value 
of a given variate, x, we also know the values which a second variate, y, 
can take. Then it will be possible to construct a joint probability dis¬ 
tribution, from which can be obtained the probability that any combina¬ 
tion of X and y will occur in random draws. The bivariate frequency 
function will be represented symbolically by f{x;y). The conditions 
required for a mathematical function F{x,y) to be used as a cumidative 
joint probability distribution are analogous to those in Sec. 3.7 for the uni¬ 
variate case. These conditions imply the following conditions for f{x,y) : 

(а) f{x,y) is nonnegative over the {x,y) plane, 

(б) where W is the entire {xpj) region. 

(c) ^ f(.x,y) can be computed for any subregion, w, of W. 

w 

Example 5.1. Consider the two-dice problem. Let a; represent the 
number of spots showing on die 1 and y the number on die 2 in any one 
toss of the two dice. The joint probability distribution is given by Table 
5.1 (p = -ji-g-). The frequency function is j{x,y) - foi’ a-ny pair of 
values, {x,y), for a: or y = 1, 2, . . . , 6. Note that 2/(.a;,y) - 1 over the 
ranges of both variates, The probability that x 

and y lie, at the same time, in ranges a < x < b, c < y < d is given by 

(?(w) ="^f(x,y), (1) 

w 

where w is the subregion defined by a < a; < 6, c ^ < d. In the two- 

41 
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dice problem 

(P(l < rr < 3, 2 < 2/ < 4) = = i 

We may define a cumulative distribution function for the bivariate case 
in a manner similar to that for 6^{w) in (1) above. If F{h,d) represents the 
probability that x < h, y < d, then, 

md) = (2) 

W 



( 4 ) 
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which is called the marginal distribution of x. Similarly, the marginal 
distribution of y is 

h{y) = (5) 

X 

As can be seen from the border totals in Table 5.1, the distributions (4) 
and (5) may be exhibited as in Table 5.2. 

Table 5.2 

Marginal Distributions of x and y in Two-dice Problem 


X = y 

1 

2 

3 

4 

5 

6 

g{x) = h{y) 

1 

■e- 

1 

1 

■g- 

1 

■g" 

1 

¥ 



Finally, the marginal distribution and the joint distribution may be 
used to define the conditional distribution, corresponding to the conditional 
'probability discussed in Sec. 2.5, Using the notation/(?/|a:) to mean the 
probability of y, given x, we know that 

S(.y\x) = f(x,y]/g{x), g(x) 0, (6) 

since 

f{x,y) = g{x)f{y\x) 

by law 3 of Sec. 2.5. The distribution (6) is called the conditional dis¬ 
tribution of y. Similarly, the conditional distribution of x is 

fi:Ay) = fi^,y)/Hy), Ky) ^ o. (7) 

Now, if f{y\x) does not depend on x, then y and x are said to be inde¬ 
pendent variates, since 

f{x,y) = gix) ■ h{y). (8) 

This is true for the two-dice problem since for any x = Xi, f{y\xl) = 
Hence, by (8), 

= i ' i = aV, 

for rr, 2 / = 1, 2, . . . , 6. 

6.3. Continuous Bivariate Distribution. In the case of two con¬ 
tinuous variates, x and y, the probability that x is in the interval {x, x + dx) 
and y in the interval {y, y -|- dy) is 

f{x,y) dx dy. (9) 

The graph of z == f{x,y) is called the frequency surface. The frequency 
function/(a:,?/) is nonnegative, but it may be zero over certain subregions 
of the {x,y) plane. 
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The probability that {x,y) will fall in some subregion w of the {x,y) plane 
W is given by 

(y{w) = j f fi^,y) dy, (10) 

w 

since 

j j f(^:y) dx dy = l. (11) 

W 

The cumulative distribution function is given by 

F{x,y) = dy dx. (12) 

The marginal distribution of x is given by 

g{x) = f{x,y) dy, 

and similarly for h(y). 

Finally, the conditional probability that y lies in the interval {y, y + dy), 
given that x is in the interval (x, x + dx), is 


fiy\x) dy = 


f{x,y) dy dx ^ f{x,y) dy 
g{x) dx gix) 


(13) 


Again, if/('^|a;) is independent of x, that is, if the right side of (13) does not 
contain x after algebraic simplification, then x and y are said to be inde¬ 
pendent variates and 

S{^,y) = • h{y). 


Example 5.2. Given f{x,y) dx dy = dx dy, (x, y > 0). 

(a) F{x,y) = dy dx. Yov x = Xi = \, y = = I, 

= <?{x, 1/ < 1) = (1 - e-^’Od - = -3996. 

(h) g{x) = /q'" dy = 

Q—x—y 

W f{y\x) — —zy- = (independent of x). 

Example 5.3. The normal bivariate distribution with means of x 
and y both zero is 

1 1 r j/2 2pxy~\ 

f{x,y) dy dx = - -== e 2 (i-p 2 )Lcr *2 ^^'^«Mydx, 

2Tr(Jx(ry V 1 — 


where —co <a;< ^o,— oo < y < 



MLILTIVAKIATE i'AKtJA T UIjAI iui\ 




(a) g{x) = / " f{x,y) dy = - e 

(Tx 'W 2,1T 


-a:V2«Ta:2. 


»'«*> - W %TVWW) 


1 1 r 2/ P2;T 

_ _ g 2{l~p-^)l<ry trxJ 


If p = 0, 


f{y\x) = 


\/^ 


g-y 2 / 2 <r ,2 = h{y); 


hence, in this case, x and y are independent. Hence, p may be used as a 
measure of relationship between x and y. It is, in fact, the population 
correlation coefficient between x and y. 

5.4. Distribution of Functions of Discrete Variates. In order to obtain 
certain properties of bivariate and multivariate discrete or continuous 
distributions, such as expected values, moments, and moment generating 
functions, it is sometimes necessary to make transformations of the 
variates so that the summations or integrals may be evaluated. The 
discrete case will be considered first. 

The distribution of a function of x, say 2 : = \p{x), given the distribution 
of X, is simple if there is a one-to-one correspondence between x and ;2, 
that is, if for every value of x there is only one value of 2 , and vice versa. 
In this case, the same probabilities hold for 2 ; as for x. For example, con¬ 
sider a single die with f{x) = = 1,2, . . . ,6). Suppose 2 : = In 

general, there is not a one-to-one correspondence between x and 2 :, because 
X = ± \/z, resulting in tivo values of x for each 2 . However, in our die 
problem, x must be positive; hence x — z only. Therefore, 
f{z) =i,i 0 YZ = 1,4, ... ,36. _ 

On the other hand, suppose 2 : = (x ~ 1 )(j — 2), or.r = (3 ± vl + 42 :)/ 2 . 
Then, even for x always positive, there is not a one-to-one correspondence 
between x and z, since, say, for ^ = 0, a: = 1 or 2. Hence, in this case 


/(^) - 



for z — 0, for two integral values of x in the range 1 < . 1 : < 6. 
for 2 = 2, 6, 12, 20, for one integral value of x in the range 

1 < .T < 6. 

elsewhere, no integral value of x in the range I < x < Q. 


Again, if z = (x — l)(x’ — 2) • • • (a: — 6), 


[1 for 2 = 0, for six values of x in the range 1 < a; < 6. 
/(^) ” I Q elsewhere, no values of x in the range 1 < x < 6. 

If we consider a bivariate distribution, such as that of the two-dice 
problem, the distribution of a function of the two variates is still simple. 




For example, consider the distributionwhere w = xy; then 

/(I) = A. 

/( 2 ) ==/( 2 ,l) +/( 1 , 2 ) = A. 

/(3) =/(3,l) +/(1,3) 
m =f{4,\) +/(2,2) +/(1,4) = 3 V 


/(7) = 0 . 


/(3G) = /(6,G) = 

Here again ^f(w) = 1 over the possible range. 

5.6. Distributions of Functions of Continuous Variates. The dis¬ 
tribution of X is f{x) dx, X defined in the range Xi to X2. We seek the 
distribution of z — z(x). If there is a one-to-one correspondence between 
X and z and x can be solved uniquely in terms of z, then x — \p(z), 
dx = yp'{z) dz,f{x) = and the limits are 2:1 = ^2 = z(x2). The 

probability distribution of z, with those conditions, will be/[5/'(^)][5/'^(^)] dz 
over the range Zi to Z2. 

Example 5.4. If f(x) dx = 2(1 — x) dx for 0 < a; < 1, we find the 
distribution of where 2 : = x^, as follows: 

Since z is always positive, there is a one-to-one correspondence between 
z and X, that is, a: = z; hence 

f{z) dz = 2(1 — -\/z) —^ dz, 

2 -x/s: 

or 

f{z) dz = (^“^ — 1) dz, 0 < ^ < 1. 

Example 6.5. If f(x) dx = —- dx, —I < x < 1, then x = ± \/z 

for z — x‘^. For positive values of x, x ~ + \/~z, and for negative values 
of a;, a; = — \/z. In this case 

(l ~ X > o) 

f(z) ^ >0 < 2 < 1. 

IT {z~^ -(- 1) dz, it’ Oi 

f 

If we wish, we may add the two functions to obtain a single function 
f{z) dz = dz, 0 < e < 1, 
but this is not possible in all cases. 

The distribution of a function of two continuous variates, x and y, is 
more complex mathematically. We wish to derive the simultaneous dis- 
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trihiition of u and v, f(u,v) du dv, where u = u{x,y) and v = v{x,y). In 
case we wish only the distribution of m, then v is integrated out between 
its limits of integration, leaving some /(«) du. In such cases i is ne , 
sary in general to assume a one-to-one correspondence between {x,y) a 
\u,l). It will then be possible to make the inverse solution; 

a; = xiu,v), and y = yM- 
The probability distribution of f{x,y) dx dy then becomes 
j[x{u,v), y{'u,vW\ du dv, 

where J is the Jacobian of the transformation, that is, 


J 


dx 


1 

du 

du 

dll 

du 


dx 

dy 



1 





’ or j=\ 



dx 



dv 

dv 

dv 

dv 


dx 

dy 


If the second 
y{u,v). 


This implies, of course, that these partial derivatives exist. 

? ^ naorl fVion J must bc evaluatcd x — x{;u,v), 1/ * 

° The limits for u and v must be determined Individually for each prob¬ 
lem. The limits for the first variable in the integral may be functions o 

^'*'Exampfer6."'''Lct us find/(«,«) du dv, where/(.r,;v) dx dy = dxdy, 
0 < a; < 00 ,0 < 1/ < ^,u = X + y,v = xly. 

Then, 


UV 


1 + 


y = 


u 


and 


J = 


V 


1 -f V 


or 


1(1 -t- vy 

11 


T -h v\ 


— u 

(F+K) 


(1 -f r) 


( a -h y) ^ _ (I + '>y . 

u 


■.f{u,v) du dv = e' " 






V 



Fi«. 5.1. Regions of integration for Example 5.7. 


Since a; = {u + v)/2^y = (u — v)/2, then the boundary a; = becomes 

V = 0; y = 1 becomes u — v = 2; and a; = 0 becomes u = —v. We 
now plot the new region determined by these new boundaries on the {u,v) 
plane as in Fig. 5.1. In the new region, u varies from —v to v 2, and 

V varies from 0 to —1. Hence the new limits are — v < u < r + 2, 
-1 < r < 0. 

5.6. Expected Values for Bivariate Distributions. For any bivariate 
frequency function, f{x,y), the expected value of any function of x and y, 



say ^{x,y), is 

FAi'{x,y)] == yy'A(a;,J/)/(a:,J/) or j j <p{x,y)!{x,y) dx dy, 

\V w 

where W is the entire region of (x,7j). The following specializations of 
\l/(x,y) will enable us to derive some simple rules for operating with 
expected values. Continuous variates will be used in the derivations, 
but similar rules will hold for discrete variates. 

(1) E(c) = c, where c is a constant. 

(2) E(cx) = c dx = c j_ ^ xg(x) dx = cE{x). 

(3) E{x + 2/) = a: [ f{x,y) rfj/] dx 

+ / y y [ /_y f{x,y) dx^ dy 
= y y xg{x) dx + j ^ yh{y) dy = E(x) + E{y). 

(4) E{xy) = J_y * [/„y yfi^>V) dy^^ dx = xhiix) dx, 
where 

hi(x) = yf(x,y) dy. 

Note that E(xy) can be evaluated, if the integrations can be performed, 
even though x and y are not independent. 

(5) If X and y are independent, then /(x,|/) = g{x) • h{y) and hence 

E{xy) = ^ xg{x) dx j_ ^ yh{y) dy = E{x)E{y). 

6.7. Moments. The product moment x'^y'' about the origin is given by 
= E{x’y‘) = J_y x’y’}{x,y) dx dy. 

Let the mean of x be y[Q and the mean of y be then 

= E[{x - li'wYiv - /oi)1 = /_y /_y - y'lA'iy - ynyf{x,y) dxdy. 

Example 6.8. To find al, defined as equal to we have 

/X20 = E{x — gio)^ = y2Q ~ (Mio)^- 
• '•O'! ~ M20 (Mio)^* 


Similarly, 


~ M02 — Mo2 (Moi)^* 



Example 6.9. To find a^y, defined as equal to juii and called the 
covariance of x and y, we have 

Mil = E[{:^ - Atio)(2/ - mJi)] = E{^y) - Mio/oi- 

. _ / _ / / 

• • Oxy Mil MioMoi* 

If X and y are independent, E{xy) — mu = MioMoi? hence <7xy — 0 for this 
condition. The correlation, pxy, between x and y is defined as the non- 
dimensional quantity 


It can be shown that — 1 < p < 1. Also, if Hxy ~ 0, then pxy = 0. 
Example 6.10. To find the variance of (:r + t/), we have 

- Mio + y - m'oiI^ = + O-y + 2(7xy 

= O-J + 0-2 + 2p(Tx(Tv. 

If X and y arc independent, that is, o-^y = 0, then = 0-2 + 0 - 2 . 

Example 5.11. To find E{xy) for the bivariate normal distribution, 
with pio = 0, Moi = d, we have 


E{xy) = k xye~^ dx dy, 


where 


2T<Ix(Ty V 1 — p^ 


1_ I 

2(1 - p2) lal al o-xO-y 


Note that for the bivariate normal, pxy has been abbreviated to p. 
The function 6 may be written 


.r 2 2pxy pY 
^2 ^ ^ ^2 


1 4- y' 

2 i - p 2 ^ O-J 


where t 


x py 


Then, using the methods of Sec. 5,5, 


E(.xy) = k y 


<rl{tay + py) 


L g 2(1-p2) g-J/V2(T4 


-V 27r y - 00 <7-2 


-yV2<rj/2 J„. _ 


dy = po-xo-y. 



0<a^< , 0 < 7j < oo. 


Example 5.12. Consider 

f{x,y) dx dy = dx dy, 

Then, 

E{x) = m'.o = C e--- [ // d:i^] rfj/ = 1, 


E{y) = = 1, 

f _ f _ 9 

M20 ~ 1^02 ■" 

O-x = == 


Mil = // dx ye-y dy = 1, 
O'xy — Mil ” MloMoi “ 


and 


Pxy = 0- 

Example 6.13. Consider the following discrete bivariate distribution 
with p 


1 . 
T^* 


y\ 

X 

0 

1 

2 

Total 

0 

2p 

2p 

2p 

6p 

] 

P 

4p 

P 

6p 

Total 

3p 

6p 

3p 

12/; 


Then, 


^ X — a. 

m'io = yy ^f(x,y) = .5, mJi = yy ?y/(x,»/) = i.o, 

on 0 0 

2 1 2 1 

4o = ^y = -5, Mm = 22 


M 2 O 
,2 


0 " o' 0 0 

.5 - .25 = .25, al = 1.5 - 1.0 = .5, 

2 1 

Mn = xyfix,y) = .5, (Txy = .5 - .5 = 0, 
0 ^ 


and 


p = 0. 

Note that we cannot state that x and y are independent in this case 
even though p = 0, since fix\y) is not the same for all values of y. It 
can be seen th&t fix\y = 0) = |, i]f{x\y = 1) = i, I; and 


f{x\y = 2) = I, i. 

6.8. Moment and Cumulant Generating Functions. For the bivariate 
case, the moment generating function about the origin is defined to be 
vi{tx,tu) = The moment generating function about pin and 




ju'oi is defined as 


JD/IO/U Oi -I'i IHjC/UJXX 


M(t^,ty) = E(e’^) = }fej(x,y) dxdy, 


where T = {x — juio)^^; + ( 2 / — Moi) 4 and the double integration is 
performed over the ranges of x and y. 

Then, 

00 

M{txdy) = ^ yrs 

r,s = 0 

and 

dtldtl ,.=,,= 0 * 

Note that juoo = 1? moi — Aiio = 0. 

The cumulant generating function in this case is given by 

00 


If f{x,7j) = g{x) • h{7j), the moments of x and y may be computed 
separately. In this case 

dx • ^e^^'h{y) dy = M{Q • M(4), 

where 

Tx — 0 ^^ l^io)^x, ^ y ~ iy 2 ^ 0 i )4 

and the two integrations are taken over the respective ranges of x and y. 
Hence, we have the following theorem: 

Theorem 5.1. The mom,ent generating function about the origin of the 
su7n of two independent variates is the product of the moment generating 
function of each. 

Proof: 

'^{t)x-^y ~ = E{e^^)E{e^^) = m(4) • ynfty). 

The theorem is also true for the moment generating function about the 
mean of the sum of two independent variates. 

For the bivariate distribution 

K = log m{Q + log m{ty) = K{Q + K{ty). 

6.9. Extension to k Variates. Let 


fi'-Ti] dXi 




MuLTlV AltlA l £j 1 ^ 


r6pr6S6iit tliG joint probability distribution of the k vaiiates, Xi, X 2 j . . . j 
Xk, where/{iCi} is the frequency function of the k variates and 

k 

]][ dxi = dxi dx2 • • • dxk. 

i = 1 

The following properties are those for the continuous case, but they can 
be applied equally well to discrete variates: 

k 

(o) (p(t«) = I j f n dxi, 

■w i = 1 

where w is a subregion in the fc-dimensional space, W . 

k 

(b) g{x,) = f ■ ■ ■ f f\xi\ U dx„ 

W i = 2 

f {i 1 

(<•) f{xi\x 2 , ■ ■ ■ ,Xk) = 

where 

f(x 2 , . . . ,Xk) = dxi 


over the range of Xi. 

(d) For a transformation 


= ^l^{Xi], 

i = 1,2, 



dui 

dui 

dui 

1 

dxi 

dX2 

dXk 

J 

dUk 

dUk 

dUk 


dXi 

dX2 

dXk 


k 

(e) E[e{xi]] =[■■■[ Hxi\f{xi] n dXi. 

•' Tr i = i 

k 

(/) = /■■■/ n 

Example 6.14. The multinomial distribution is a generalization of the 
binomial distribution. Any one of the events j/i, yi, . . . , y* can occur 

k 

with respective probabilities pi, P 2 , • • • , p/e on a single trial, ^ pi — 1. 
If n trials are made, the probability that j/i occurs xi times, occurs 0:2 

k 

times, etc., ^ ^< = «, is given by 

i = l 




s\xi} = n pf. 

t=i 

This is the general term of the expansion of 

(Pl + P2 + • • • + PkY. 

For example, a single die can show the number 1,2, . . . , 6 on the upper 
face with equal probabilities, pi — 

In this case the moment generating function al)Out the origin is 



EXERCISES! 

5.1. The marginal distribution of a: for a bivariate distribution function 
fix,y) is 

dy, 

t The following exercises contain important theory which will be referred to subse¬ 
quently and hence should be worked by all students; 5.1, 5.2, 5.4, 5.5, and 5.6. 
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The limits for y must be determined by the region W within which 
f(x,y) is defined. If W includes the entire (x,y) plane, the limits are 
(—oOjQo). However, consider a problem of this nature: f(x,y) = kxy 
for X < y, 0 < y < 1; then 

g{x) = kxy dy, 

and 

k{y) = jj kxy dx, 

Find the explicit values of g(x) and h(y). 

6.2. Given a continuous bivariate frequency function f{x,y) and the 
corresponding marginal distributions, g(x) and h(y). Set up the integrals 
for the following: (a) mean and variance for y for a given x, (yy\x,(TyiJ; 
(6) mean and variance of x for a given y. 

6.3. Given the bivariate frequency function/(a;,y) = 2/a^, 0 < x < y^ 
0 < y < a. Show that (a) f ff(x,7j) dx dy — 1, over the respective 
ranges of x and y\ {h) g{x) — 2(a — x)/a‘^; (c) h(y) = {2/a^)y; 


0 <x <1, 
0 < y < 1. 


(d) f(.x\y) = 


(^) Mio " <^/3, /Xqi — 2a/3; (/) p — (g) Py\x — (a -}- x)/2] Qi) pxw — y/2. 

6.4. In Exercise 5.2(a) the mean value of y for a given x (jUyia;) is a 
function of x and hence defines a curve in the {x^y) plane called the curve 
of regression of y on x. If the regression of y on x is linear — oc 
then 




“ .J{x,y)dy 


= ky- 


gix) 


= a + ^X 


or 


/_« ^y = + ^xgix). 


By integrating each side of the last equation with respect to x, show that 

//qi == ce + |(3/ii(). 


Before integrating, as above, multiply both sides by x, and show that 

Mu = ^Mio + /3 m2o- 


Hence, find the values of a. and 13 in terms of the moments of the original 
distribution, and show that 

= Moi + P — Mio)- 

(T X 



5 . 6 , In Exercise 5.3(a) if the variance of y for a given x is averaged 
over all values of x, we have 

= / " 1“ (y - yai^yf(^;y} dy dx. 

J — J — w 

If Hyix = a + ^x, as given in Exercise 5,4, show that 


= <^ 1(1 


5 . 6 . Work Exercises 5.3(a), 5.4, and 5.5 for a normal bivariate dis¬ 
tribution. 

5 . 7 . As an example of the distribution of a function of two variables 
for a discrete bivariate distribution, consider the distribution of the 
sum of two independent Poisson variates, x and y. The joint distribution 
of X and y is 







Let z — X y, or X 


y; then 


_ ^ _ 


{z - y)\y\ 

Sum out the variable y over the range ^ < y < z, and show that 

.■/.A_ imi + mtY 


Hence, show that the sum of two independent Poisson variates is dis¬ 
tributed as a single Poisson variate with a mean equal to the sum of the 
two single means (m = mi + m 2 ). 

6.8. Given f{x,y) = 6(1 — x — y) for {x,y) contained within the 
triangle bounded hy x — Q, y — Q, x y = 1. 

(а) Find the means and variances of x and y and the covariance of 
X and y. 

(б) Find the equation of the regression line of y on x and 

6.9. Given Jix^y) = kxyil ~ x — y) over the same triangle as in 
Exercise 5.8. 

(a) Find the value of k which makes f{x,y) a frequency distribution. 

(5) Find the marginal distribution gix), 

(c) Find 

5 . 10 . If is a discrete variate having the Poisson distribution 


and y is another discrete variate having the binomial distribution 
fiy\x) = 0 < y < oo , 

show that the marginal distribution of y is 

1/1 

5 . 11 . Given that x and y are normally distributed with zero means 
and variances cr| and o-^. Find the distribution oi z = x y. 

6 . 12 . If x and y represent the number of dots appearing on dice A 
and B, respectively, what is the probability that in throwing the two 
dice a: + % ^ b? 

6 . 13 . Given f{x\y) = /x\ and lri{y) ~ e~^, where x is discrete 

{x = 0, 1, . . .) and y continuous (i/ > 0). Show that g(x) = 

5 . 14 . Given two continuous variates x and y (;r, y > 0) with the fre¬ 
quency distribution/(x,2/) = 2(1 + x + y)~^. Find g{u) and (P(n < 1), 
where u = x y> 

5 . 16 . Given f{x,y) = 1 over the square (0 <x,y < 1). Show that 
(9{xy > u) = 1 — w + w log w. 

6 . 16 . Show that the joint moment generating function of Xi and X2 
from the bivariate normal distribution with means yi and /X 2 , variances 
g\ and g\, and correlation p is given by 

5 . 17 . Use the results of Exercise 5.16 to find the variances and covari¬ 
ance of xi and X 2 . 


CHAPTER 6 


DERIVED SAMPLING DISTRIBUTIONS AND ORTHOGONAL 
LINEAR FUNCTIONS 

6.1. Introduction. For the sequence of the statistical method— 
specification, distribution, estimation, and tests of hypotheses —we have con¬ 
sidered certain parent populations and their properties which are usually 
specified in applied statistical investigations. After problems of specifica¬ 
tion it would seem logical to discuss problems of estimation next in order. 
Such problems involve determining what functions of the sample observa¬ 
tions should be used to ‘^besC’ estimate the parameters of the specified 
population distribution, where “best” must be defined in some exact 
manner. For example, if Xi, X 2 , . . . , Xn is a sample of n observations, 
specified as having been drawn in some manner from a normal distribution 
with mean, u, and variance, what two functions of the observations 
should be used to “best” estimate y and c-? A function of the observa¬ 
tions used to estimate a population parameter is called an estimator. The 
numerical value obtained by using the estimator is called an estimate. It 
seemed appropriate to earlier workers in statistics to use the sample mean, 
X = SX/n, and the sample variance, {s'Y = 2(X — X)“/n, as estimators 
of the two populations parameters y and However, as will be shown 
later, in cases where the sample is a random sample, the “best” sample 
estimate of cr“ is = 2(X — X)^/(n — 1). A random sample is a sample 
drawn in such a manner that the probability of obtaining any member is 
independent of the probability of obtaining any other member. 

A less restrictive method of estimating the value of a parameter, 6, is a 
method which derives limits ci and C 2 which are functions of the sample 
values (A^i,X 2 , . . . ,X„). The interval (ci,C 2 ) will contain the parameter 
6 a certain percentage of the time. The limits are thus functions of the 
sample and this percentage, which is called the confidence prohability. 
It is understood that in each repeated sample a new set of confidence 
limits is determined. The concepts of confidence limits were introduced 
by Neyman.^ 

Various methods of estimating the confidence interval will be discussed 
in later chapters. Fisher^ uses the term fiducial limits to indicate 
essentially the same concept. 

In the chronological development of modern statistical methodology, 
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however, the distribution of estimators and the distributions of certain 
functions of these estimators used in making tests of hypotheses and 
setting confidence limits did not wait upon the development of a sound 
theory of estimation. In this chapter, then, we shall assume, for the 
time being, that certain estimators are the ^'best’’ estimates of the cor¬ 
responding population parameters and that certain functions of these 
estimators used in making tests of hypotheses and setting confidence 
limits are the ‘‘best” such functions. 

6.2. Derived Sampling Distribution Problems. In a typical applied 
statistical investigation, for example, a sample survey or an experiment, 
we specify that the observations obtained by some sampling process 
are drawn from some particular parent population distribution. We 
then calculate certain functions of the observations as estimates of the 
parameters of the specified population. Now, in order to set confidence 
limits for the population parameters or to make tests of hypotheses 
concerning the population parameters, it is necessary to know the 
probability distributions of the estimators, for example, of X and for a 
normal parent population, and of certain functions of the statistics, 
for example, of t, and F. Mathematically these probability dis¬ 
tributions are derived from the specified parent population distributions 
and hence are called derived sam.plmg distributions. 

6.3. Random and Systematic Samples. In applied statistics two 
kinds of samples are in common use, (a) the random sample as defined 
above, and (5) the systematic sample. In the latter the first member 
7nay be chosen at random, but subsequent members depend upon the 
position or value of the preceding members (a systematic sample after 
a random start), or all members of the sample may be chosen systemati¬ 
cally, including the first member. 

Soil sampling—to study nutritive^ and other components'^ of the soil— 
in most cases makes use of some form of systematic sampling. Soil 
samples are selected from various spots in a field, and chemical determina¬ 
tions are made to determine what nutrients are required to bring the 
soil up to a suitable productive capacity. It would be possible to lay a 
grid down on the field and select the samples at random from the grid. 
This method, however, presents many practical difficulties such as the 
exact determination of the selected sample point and the excessive time 
consumed in finding these points, especially if the field is irregular in 
both boundary and contour. It is much easier to select the first point 
at random by selecting two random numbers to determine the two coordi¬ 
nates of the starting point and then proceed a certain numbers of paces 
from this point to the next one by a predetermined route. 

Another systematic method is to predetermine a definite route^ and 
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collect samples only along this path. This latter method is used quite 
extensively in sampling forest stands^ to estimate the amount of usable 
timber. 

Most economic data are far from random but have been analyzed in 
many instances in the past as though they were random because of lack 
of techniques for analyzing nonrandom data. More exact methods of 
analyzing this type of nonrandom data have been devised. The chief 
obstacle to the use of the more exact techniques is computational. The 
rapid development of electronic computers may overcome this difficulty. 

The derived sampling distributions obtained in the next chapter will 
assume a random sample of n observations unless otherwise stated. 
Randomness in the sample must be ensured by the use of an objective 
method of selection. Tables of random numbers provide such a means 
and have been made available by Tippett,® Fisher and Yates,^ Snedecor,^ 
and others. 

6.4. Linear Functions. In the derivations of sampling distributions 
in the next chapter it will frequently be necessary to know the distribution 
of some linear functions of the members of a sample. Let such a linear 
function be given by 

n 

I = aiXi + a2X2 + • ' * + anXn = ^ 

where is some fixed constant and the sample, here not necessarily 
random, is represented by Xi, X 2 , . . . , X^ or {order to obtain 
some general results, in the discussion following, each Xi is assumed to be 
draAvn from a population with mean fXi and variance af. 

6.6. Properties of Linear Functions. It follows that 


n n 

E{1) = ^ aiEiXi) = 2 

i=\ 


Also, 


n n 

<Tf = E[l - EiDY = = 1 “^' + 2 o.a,.<T.„ 

i = 1 i=\ i<j 


If {X4 is a random sample, then Cij = 0 and 




i = i 


but this will not be true for a systematic sample since will not be zero 
in that case. 


,r 


Finally, if all iii = ix and all o-? = o-^, then 


E{1) = M J 


i = 1 


and 


= a'' 2 

i = l 


af. 


Example 6.1. If each ai = l/n, then I = (1/n) ^ Avhich is the 
sample mean, X. Further, for any sample 


1 = 1 


X(X) = ^ M 

fv 


Also, if the sample be random, 

4 = E(X - m)^ = «<r’ • A = -• 

7V n 


In Sec. 6.1 mention was made of a “bcst’^ estimator of a population 
parameter. One property usually desired in a '^best^’ estimator is that 
of being unbiased. An estimator of a population parameter is said to 
be unbiased if its expected value is equal to the population parameter. 
It follows from Example 6.1 that X is an unbiased estimator of fx whether 
the sample is random or systematic. However, using an analogous 
method to that for a random sample for obtaining an estimate of the 
variance of a systematic sample does not lead to an unbiased estimate. 


Example 6.2. 


Let 


n 



i — 1 


a random sample from a 


population with ^Xi = ix, af ~ then 


Also, 

af = E[l ~ E{l)y 


E{1} = 2 aiE(.Xi - m) = 0. 

i = l 


n 

E [ ^ 4(X. - m)” + 2 aMXi - m)(V - m)] 




t <j 


i=l 
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Example 6.3. To find E{S) where ^ ^ ~ ^)^ under condi- 

i = l 

tions given in Example 6.2, we find 


S = y [(V - m) - - m)]- = y (V - 

i=l t=l 


n{X - }x)\ 


But 


Also, 


E [ y (X.- - A.)*] = na\ 

i = 1 


E{x - = 


by Example 6.1. Hence, 

E{S) = = (n - 1)(j\ 


Now, if we set = S/{n — 1), then E{s^) = a- and therefore is an 
unbiased estimate of for a random sample. 

For a discussion of the above problems for systematic samples, consult 
W. G. Madow and Lillian Madow,^ Lillian Madowd® and Cochran. 

6.6. Orthogonal Linear Forms. Consider the two linear forms 


VO n- to 

^ aiXi and Z 2 = ^ hiXi. Since E{li) = ^ anii and E{1‘^ = 

i=l i—l i=l 


^ hillij then E{li ± ^ 2 ) = ^ {o^i ± 
i=1 i—l 

Now, if {X,:} is a random sample with gj = g, cr? = cr^, then 


~ ^ = 0-2 ^ hf, and 0-12 = 0-2 V hi. 

i=l i=1 i=l 

Hence the condition that would make h and h uncorrelated is that 

n 

^ aihi = 0. Two uncorrelated linear forms are said to be orthogonal. 

i — l 

Example 6.4. For a random sample of 5 drawn from a normal popula¬ 
tion with mean, ju, and variance, the mean value and variance of 

h = Xi + X2 + X3 + X4 + X5 = 5 X 

are, respectively, 

E(li) = /id-ju4-ju4-M + M = Sm; 

af; = 5(72, 



For a second linear form 


we obtain 
and also 


u - - 2 Xi~X 2 4-X4 + 2 Xr„ 

E{1<^ — —2/x — M “t" M d” = 0 

[(l)(-2) + (1)(-1) + (1)(1) + (1)(2)] = 0. 


For a third linear form 


h = 2 Xi - X2 - 2X3 - X4 + 2X5, 

we obtain 

E{U) = 2/i — ju — 2ju — jLt + 2ju = 0, 

also 

= crd(l)(2) + (1)(-1) + (l)(-2) + (1)(-1) + (1)(2)] = 0, 

and 


= ^d(- 2 )( 2 ) + (- 1 )(- 1 ) + ( 0 )(- 2 ) + ( 1 )(- 1 ) + ( 2 )( 2 )] = 0 . 

Notice that the sum of the coefficients of U and U, respectively, is zero 
and that I 2 and h arc imcorrelated with h, illustrating the theory in the 
paragraph above. 

Example 6.5. To determine a fourth orthogonal linear form, 

h = ?>iXi + 62 X 2 + bzX‘s + 64 X 4 + ^ 5 X 5 , 

to Zi, h, and h of Example 6.4, we find 

hU: bi 1)2 h'i hi hs — 0, 

loli - — 261 — ^^2 H“ ^4 H“ 265 = 0, 

Izli'. 26i — 62 — 263 — 64 -)- 265 = 0 . 

Eliminating 63 from the first and third equations and then 62 from the 
remaining two equations, we find 

hi hi 365 = 0 . 

This condition will be satisfied if we let 65 = 1, 64 = —2, and bi = — 1 . 
Then, we find 62 = 2 and 63 = 0. Hence a fourth orthogonal form 
desired is 

U = -Xi + 2X2 - 2X4 + X5. 


There are an infinite number of such functions. 

6.7. Linear Forms with Normally Distributed Variates. Suppose 

n 

that the Xi in the linear form I = ^ aiXi each follows a normal parent 


population distribution with mean, /i,, and variance, o-?. Then, the 
probability of getting the particular X/s in a random sample of n will be 


n dx,. 

li 


It is desired to find the distribution of [/ — E(l)]. 

The moment generating function of [I — E{1)] is 
M{t) = 


1 1 
(27r)«/^ ^ 

n- 


ri 

J - 00 J— M 


t = 1 




By completing the square of the exponent we have 

- [(Xi - Mi)* - 2cr|fa.(Xi - M.) + 


^ [(X, - mO - crlto.P + 


Hence, 


M{t) = n 


Now, the moment generating function for the normal distribution 


(T \/27r 


g-2/V2(r2 


00 < y < 00 , 


= —b= f ” dy = 

(T V27r J- ^ 

n 

Hence, we may say that I is normally distributed with mean E{1) = ^ ami 

i = l 
n 

and variance ^ a|(7?. This result is analogous to that found in 

Sec. 6.6 for the mean and variance for nonspecified parent population 
distributions. 


ORTHOGONAL LINEAL LUi\oiiui\o 
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EXERCISES 

6.1. Determine a fifth linear form orthogonal to the form in Examples 
6.4 and 6.5. How many linear forms make a complete set for a sample 
of n? 

6.2. Given a random sample of N, all members being drawn from the 
same population. Determine the relationship between the cumulants 
for the total of the sample and the cumulants for a single member of the 
sample. Hint: 

mMt) = [mWf. 

6.3. Use the results of Exercise 6.2 to determine the first three cumu¬ 
lants for the total of N from a binomial distribution. {N represents the 
number of samples and n the number of independent trials per sample.) 

6.4. A¥hat happens to Exercise 6.2 if each member of the random 
sample is drawn from different populations? Hint: Show that 

N 

K^x{t) — ^ Ki(t) 

1 = 1 

and the rth cumulant for the sum is hence 


N 



1 = 1 


Consider this problem for Poisson distributions with unlike means and 
binomial distributions with unlike values of n and p. 

6.5. Given two independent estimates of F, Xi and X 2 , with variances 
o-f and cr|, respectively. If we desire to estimate Y as an unbiased linear 
function of the A^’s, find the coefficients of the X’s which will miniijiize 
the variance of the estimate. Hint:j&(X^) = F. Let F = hXi -h Z 2 X 2 . 

6 . 6 . If the Xi in the linear form I = are normally and independ¬ 

ently distributed with means m and variances (t| and ail ai = 1/n, 

= IX, and (Ti = (T, then I = X. Show that the mean of a normal sample 
is normally distributed with mean ju and variance 

6.7. Suppose that Xi represents the mean of a sample of ni taken from 
a normal population with mean jui and variance <jI and X 2 the mean of a 
sample of ^2 taken from another normal population with mean fX 2 and 
variance al Let Z = Xi - X 2 . Show that Z is normally distributed 
with mean ixi — ^2 and variance 




66 


BASIC STATISTICAL THEORY 


6 . 8 . Consider the following applied problem: The effect of a nitrogen 
top-dressing on a crop is to be determined. In addition it is desired to 
discover at what time during the growing season the top-dressing, if 
beneficial, should be applied. The experiment was designed as follows: 
Use four plots, one with no nitrogen, one with nitrogen applied early, 
one applied in the middle of the growing season, and one applied later. 
Naturally, this entire experiment would be repeated several times, say 
r, in order to discover how consistent the differences were from replication 
to replication. The totals of the 4r plots are indicated as follows: 



Nitrogen applied 

No N 

Early 

Middle Late 

To 

Tx 

T 2 F 3 


r 

To = Xoi + Xoi • +-X'or, Ti = ^ Xij, etc. 

i=l 

Suppose Xij is where in is the mean effect of the fth 

treatment. Three pertinent independent linear forms are: li = STq -\- 
7.\ + 7^2 + 7^3, for determining the effect of nitrogen; h — Tz — Ti, for 
determining the linear effect; Iz = Ti — 2 T 2 + T 3 , for determining the 
quadratic effect. Find a fourth linear form to complete the set, and 
determine the variance of each. 
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CHAPTER 7 


DERIVED SAMPLING DISTRIBUTIONS: NORMAL PARENT 

POPULATION 

7.1. Introduction. In this chapter we shall consider only random 
samples of n drawn from a normal parent population with mean ii and 
variance that is, from If it is desired to indicate that two or 

more variates are also independently normally distributed with mean fx 
and variance we shall denote this by 

NID(M,fr2). 

The derived sampling distributions which will be discussed here are of 
importance in applied statistics in making tests of hypotheses and setting 
confidence limits. As pointed out earlier, we shall assume in these dis¬ 
cussions of distribution theory that certain estimators of population 
parameters and certain test criteria are the ‘‘best,^^ and we shall be 
interested here in obtaining their probability distributions. In later 
chapters on tests of hypotheses and estimation, a discussion will be given 
of what properties a ^^best’^ estimator or an “appropriatetest criterion 
should have. 

Methods of obtaining test criteria and setting confidence limits will 
now be explained in order that an understanding of the uses of these 
important statistical methods may be obtained simultaneously with the 
derivations of the related sampling distributions. The methods described 
below do not take into account important considerations described in 
detail in subsequent chapters on estimation and tests of hypotheses, but 
for the cases presented the corresponding techniques will be the same. 

If it be possible to find the sampling distribution of a function of an 
estimator and its corresponding population parameter which is inde¬ 
pendent of the parameter and all other unknown parameters, then hypo¬ 
theses specifying numerical values of these parameters may be tested. 
Such a hypothesis is often called a null hypothesis, and the function called 
a test criterion. 

By making probability statements regarding the test criterion in the 
form of inequalities, and then solving these inequalities for the population 
parameter, one may set confidence limits for the particular parameter. 

Illustrations for the techniques described in the above two paragraphs 
will be presented in subsequent sections of this chapter. 
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If the specified parent population distribution is not normal, then the 
derived sampling distributions may become unduly complicated. Non¬ 
randomness in the sample presents additional difficulties. Consequences 
of the relaxation of such assumptions regarding the parent population and 
the sample have been studied to some extent, and further investigations 
may be expected in the future. 

7.2. Distribution of the Sample Mean, X. In Sec. 6.7 it was shown, 
for a random sample of size n, with each Xi from a parent population 


n n 71 

N{}Xi,(j\), that the distribution of ^ ^ is iV ( ^ ana, ^ 

1=1 i=l i=l 

71 n 

If ai = ixi = IX, and af = a-, then ^ aXi = X, ^ a/jUi == At, and 

^ , = i ,:=i 


n 

1 


afo-f = 


i = i 

Let 


Therefore the distribution of X is N(ix, ar’^ln). 


^ _ X At 




The sampling distribution of T is X(0,1), which is symmetrical about 
zero. Then, if a be known, T fulfills the requirements set out in Sec. 7.1 
for a test criterion which may be used to test a hypothesis specifying a 
numerical value for /x, say fx — ixq. 

Suppose a random sample of size n is drawn from the assumed popula¬ 
tion, X(Ato,cr“). The sample mean, X, and a corresponding value of 
called To, are calculated, where 

^ _ X - Ato 

It is possible, using Table Ih in the Appendix, to find the probability of T 
being greater in absolute value than the calculated To. Should this prob¬ 
ability be as small as or smaller than some preassigned small numerical 
value, a, we say that we have evidence that the hypothesis being tested 
is not true, or we should be forced to accept the occurrence of a very 
unlikely event. This a: is called the significance probability, and the 
entire testing procedure a test of significance or a test of the hypothesis that 
fx = IX 0 . A more precise mathematical treatment of tests of hypotheses 
will be presented in Chap. 11. 

The validity of the procedure set out in the above paragraph may be 
checked empirically by constructing a simulated population, N{ixo,o-^), 
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drawing a large number of samples of size n, and finding the empirical 
distribution of the values of Tq obtained from these samples of size n. It 
would be found that approximately a of the calculated Tq values would 
lie outside of the corresponding tabular T values (usually denoted as 
— Ta and To). 

The probability, \ — a, that T lies between —T^ and Ta may be 
expresvsed as (9[ — Ta<T<Ta\ — 1 — a, or- dy = I — a. 

It follows that 


(9[T < Ta] = 1 - f 


or 


Solving for ji, we find 

Similarly, 

Hence, 


(P 


(P 


< T 

(t/'s/ n 


X - Ta<y ^ 


= 1 


= 1 - 

2 


(P 


M < 


X + T gcr 


= 1 


(P 


^ 

\/ 71 ^ J 


— 1 — a. 


( 1 ) 


The symbol (PfT < TJ is read ^H,he probability of T being less than TgX 

The inequality set out in (1) above provides a confidence interval for fi 
at the (1 — a) confidence 'probability level. The validity of the procedure 
described may be checked in a similar manner as that suggested for the 
test criterion in preceding paragraphs. It should be noted that ju is a 
population parameter and does not have a sampling distribution but is 
some fixed though unknown quantity. For some sample in hand, (1) 
above is not an a priori probability statement but indicates an average 
outcome to be expected in repeated sampling. This kind of probability 
is called a confidence probability. A more precise mathematical formu¬ 
lation of these concepts will be presented in Chap. 10. 

Values of (1 — a/2) are given in Table 16 in the Appendix for y ranging 
from 0 in steps of .01 to 3.49. The reader must interpolate in order to 
determine y = Ta for a specified value of a (see the section on Explanation 
of the Tables preceding the tables). Values of Ta are given at the bottom 
of Table Ib in the Appendix for a few selected values of a. 
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Example 7.1. Given a normal population of yields with variance 
0-2 = 90. The following nine sample values were obtained from this 
population; 65, 45, 43, 40, 64, 58, 52, 56, 63. The mean of this sample is 
X = 54. On the basis of previous experience it was believed that the 
true mean, fi, was 50. Using a significance probability of a = .05, does 
the sample contradict this presumption? We find that 


7’„ = = 1.26. 

Vio 


From Table 16, we see that T.os = 1.96. Since To is less than T,or^, we 
cannot contradict the presumption that ^ = 50 on the basis of this 
sample. 

On the basis of this sample, in the absence of any knowledge of ju, we 
could set up confidence limits for /x. The 95 per cent confidence limits 
are 

54 - (1.96)(3.16) < /X < 54 + (1..96)(3.16), 
or 

47.8 < /X < 60.2. 

This states that from our sample of nine values, we infer that the true 
mean lies between 47.8 and 60.2 with a confidence probability of 95 per 
cent. 

7.3. Law of Large Numbers. In the preceding section it was shown, 
for a random sample of n from an arbitral’}^ parent population, that 
E[X] = 11 and o-J = a^/n. It follows that whatever the form of the 
parent population distribution (provided the variance is finite), the dis¬ 
tribution of the sample mean becomes more and more concentrated about 
the population mean as the size of the sample increases. It is evident 
then that as the size of the sample is increased, the more confident we can 
be that the sample mean provides a ‘‘good” estimate of the population 
mean. Essentially this is the meaning of the law of large nwmhers. A 
more precise statement of this property is provided by Tchebysheff’s 
inequality, which will be derived in Chap. 8. 

7.4. The Central-limit Theorem. This theorem states that : 

If an arbitrary population distribution has a mean /x and finite variance 
then the distribution of the sample mean approaches the normal distribution 
with mean ju and variance o-^/n as the sample size n increases. 

The proof of this important theorem is beyond the scope of this text 
except for a distribution possessing a moment generating function. Now^ 
if Y is distributed as N{ii,(t-) and u = {Y — then 




-o/ioio j±i loi iniijun,! 


mu{t) = V - >'] 


.-. Ku(t) = 

If X has an arbitrary distribution, and v = (X — (jl)/(t, then 


my{t) = E[e 






Also, the moment generating function oi w = {X — ji)/{e/-\/n) is 


-^(X, + X 2 +- + X, 


/ A'-m N _ Wn _i_ 

m^(0 = £^[eVo-/Vn/] = e ^ E[e'^^^ 


•■(tr.)]'- 


It follows that 


log log . 

_ \WJ 

.■.K.(,).,.loJl + i0 + --Ljp,.+ 


3! \/n 


log (1 + X) = Z - iZ^ + iZ^ - 


we have 
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2 3! 


••. lim K„(<) = i- 

n—> 00 " 

Hence, Kw,(Q approaches K«(0, and it follows that the distribution of Z 
approaches the normal distribution with mean )U and variance a^/n as the 
sample size n increases. 

7.6. Chi-square Distribution. An important distribution that enters 
into the theory of derived sampling distributions is the chi-square dis¬ 
tribution, defined as 


Mx^) dW) = 0 < < =», 




where v is called the degrees of freedom. It is easy to see that 


(P (x^ < x^) = d{x‘) = ^2) IJ" dy 

r„<.(>'/2) 

T{v/2) 

which is the incomplete Gamma function, Iy^{v/2), with ?/« = Xa/2- This 
function has been tabulated in the form by Karl Pearson^ for 

various values of u — Va/'s/p + 1 ~ Xa/2 \/p + 1 and p = {v/2) — 1. 
In this form, (P(x^ > Xa) = 1 “ These tables are described and 

their uses in statistics explained in detail by Bancroft.^ 

Catherine M. Thompson^ gives values of xl for « = -QQo, .990, .975, 
.950, .900, .750, .500, .250, .100, .050, .025, .010, and .005 and v = I {!) 30, 
where a = (P(x^ > x«)- These values are presented in Table II in the 
Appendix. 

7.6. Properties of the Chi-square Distribution. The moment generat¬ 
ing function for x“ is 


m{t) ~ 


1 

r(V 2 ) 



g-(l-20(xV2) ^ 



It follows that the oumulant generating function is 


(1 - 


K(0 = --^log(l-2f)=^^if)‘ 

t = 1 

whence 

/c. - {i - 1)12^-^ p. 

The x^ distribution has mean v and variance 2v. 

Consider a new variable y ~ (x“ — v)/\^2v, for Avhich /z = 0, cr^ = 1. 
The cumulant generating function for this new variable is 


K(0 


and 


Then 


Ki 


= i: V 

2 Z/ i{2vy'^' 

t = 2 

= 0 ) 


2 {i - 1)!, i > 2. 


Also, for f > 2 


2 ■s/2 12 ^ 

Kz = — K-i = '—' etc. 

■s/v V 


lira Ki = 0. 
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But this is a property of the normal distribution, indicating that the 
distribution approaches the normal distribution for large v. 

7.7. Distribution of a Sum of Squares of Deviations. Given a sample 
{X4 of ri from the population of A^^s which are NID(/ji,a-2). Set 

uf = (X, - (xY/a\ 


We wish to find the distribution of 


2 (V - nY 




Now, since the X’s are independent, 

'friuYi) = ' m.zit) • • * mnit), 

where 7ni(t) = muAt). Hence, since all the X’s have the same distribution 
function, 

= [mi{t)Y. 

Now 

mi{t) — —\=i f = q _ 2t)~^. 

V 27r J - ^ 

Hence, 

mui{t) = (1 — 2 ^)“”/^. 

But this is the moment generating function of a x^ with v = n degrees of 
freedom; hence 


2 (V - m)" 


is distributed as x“ with v — n degrees of freedom. In this case each X 
furnishes a single degree of freedom; hence, the number of degrees of free¬ 
dom is the number of independent values which make up x'^- The num¬ 
ber of degrees of freedom in general is the total number of observations 
less the number of independent restraints imposed on the observations in 
forming the distribution. 

This x^ distribution can be used to test hypotheses and to set up con¬ 
fidence intervals concerning the population variance, cr^, assuming the 
population mean, is known. A sample value of x^ is computed from a 
sample of n as follows: 


ft 

^ (V - y.y 


n I — 

Xo = — 



A significance probability, say a, is decided upon, and xl is then compared 
with the corresponding tabular value, xl, where 


> Xa) = «• 


If Xo is greater than xl, we conclude that is greater than the assumed ul. 
It should be noted here that we can only test the hypothesis <j‘^ — ol 
against the alternative that > crj. If it is not known that o-^ is at least 
as large as a more general testing procedure is needed; this will be dis¬ 
cussed in later chapters. 

The construction of confidence interval for is complicated by the fact 
that the x“ distribution is not symmetrical. Hence it is not clear that an 
equal amount of probability should be put outside each confidence limit. 
This problem is discussed in detail in Chap. 10. If the same amount of 
probability {a/2) is put outside each confidence limit, the confidence limits 
for 0-2 are 


n{s')~ 


< (7^ 


< 


rj^') 

xf 


2 


n 


where ^ {Xi — ^(x^ > xl) = n:/2, and (P(x^ < Xi) = oi/2. 

Example 7.2. Suppose in Example 7.1 that the true mean of the 
population was g — 50 but that was unknown. In this case 


9(s')' = - m)' = 808. 


If we want to test the hypothesis that = 90, we use the test criterion 


Xo' = w = 9.64, 

to be compared with x' for 9 degrees of freedom. If we use a = .05, 
Xa = 16.9. Hence we cannot conclude that is greater than 90 on the 
basis of this sample. We see that the 95 per cent confidence limits are 


868 ^ , 868 

19.0 < < 2.70’ 


or 46 < 0-2 < 321. 


7.8. Reproductive Property of the Chi-square Distribution. Using 
the relationship 

n 


J (V - 



we may prove that the sum of n variates, each independently distributed 


as x' with Vi degrees of freedom, is itself distributed as x' with v 
degrees of freedom. Suppose 



Vi 


and 


2 


V 7 . 

^ {X 21 ^ 2 )' 


= IZl _ 

^2 2 

Then = Xi + xl has the moment generating function 

= [mxAt)]bnx^'^{t)] = (1 - = (1 - 

But this is the moment generating function for a chi-square with (vi -f ^ 2 ) 
degrees of freedom; hence, xl + xl is distributed similarly. This result 
may be extended to n variates. 

7.9. Simultaneous Distribution of the Sample Mean X and Variance 
Estimate, It was shown in Sec. 7.7 that 


?.=1 


is distributed as chi-square with v ~ n degrees of freedom. We shall now 
obtain the simultaneous distribution of two parts of this sum when 
expressed as 


n 


Xi - xy n(X - fiY _ (n - 1)^2 n(X - m)' 

_2 I " ^2 ^2 ' .t2 . ’ 


where it will be recalled that X is an unbiased estimate of ju and is an 
unbiased estimate of o-^. 

In order to simplify the notation in the following derivation, let 
Ui — {Xi — iu)/o-, so that Ui is A^(0,1) and u — {X — 11 )/a. Let 7 ^ = 

The derivation will be accomplished by making use of orthogonal linear 
forms as discussed in Sec. 6 . 6 . To this end we set up the following orthog¬ 
onal linear transformation: 

Ui U 2 Un 

2/1 =- 7 =-^ 

\/n 

Ui — U 2 

Ui -]r U 2 — 2uz 

= — ’ 


“Wl + W 2 + 


• -j - Un-l — in — 1 )^, 

n{n — 1 ) 



It is easily seen that each y,: is orthogonal to each (^ i). Also the 

denominators have been chosen to make the variances of the y’s equal 
unity. This type of transformation is called completely orthogonal, and 
these requirements may be designated as 


From Sec. 6.7 we know that the ?//s are NID(0,1); hence, the simul¬ 
taneous distribution of the [yi] is identical with that of the and 


n fv 

y yf = u|. Also, since y\ — nu\ then 

= 1 

n 

^ Vi = (n - 1)7- 


Since yi is independent of the other y’s, u is independent of 7 ^. Hence 

f{u,y‘^) du 6 ^( 7 ^) = jiidyhW) du d( 7 ^). 


We know that 


It is also known that 


fi(u) du 




g-nuV2 


n 



is distributed as chi-square with (n ~ 1 ) degrees of freedom since there 
are (n — 1) independent normal variates. Now, ~ 1 )t^; hence 


/ 2 ( 7 ^) diy‘^) 



(n-l)/2 

(^2yn-3)/2g-(n-l)7V2 



d(y^). 


Using the original X’s, which were Niy,a^), the simultaneous distribution 
of X and s‘^ is j 


fiX,s^) dXd{s^) 



n(X-M)2 + (n-l)s2 

2 .=^ dXd{s^). 


7.10. Distribution of t. It is now proposed to obtain the distribution 
of a test criterion, independent of to be used in testing the hypothesis 
that a sample {Xil was drawn from a normal population with a specified 




mean ju. The distribution must be independent of if it is to be of 
practical use, since we seldom know the value of 
One proposed test criterion of this hypothesis is 

s/V n 

where X and s are computed from a random sample of n from the parent 
population Two properties of t should be noted: (i) t is the 

ratio between a normal deviate and the square root of an unbiased esti¬ 
mate of its variance, (ii) is the ratio between the square of a variate, 
y = \/n (X — iu)/a-, which is X(0,1), and a variate sVcr’, which is dis¬ 
tributed as x^/v with V degrees of freedom. Note that o-’^ cancels in the 
derivation of t. We shall take (ii) as the definition of t. 

From Sec. 7.7 


/(X,s2) dx = k 


where 


g-[n(X-p)^+(«-l)s2]/2,.^ yX d{s^), 


00 <X<<»,0<s2<oo, 


n [n — \ 
27r \ 2 


Writing/(X,s^) as a function of t and s'-^, we obtain, 

{n-^)sXl+-X^ 
n —2 _ _\ w — 1 / 

f{t,s'^) d(s2) dt = k'(s^) 2 e d{s^) dt 


(n - l)s^ 1 


t == t; 


fn-l{t) dt = k" I 1 + - 


n - 1 


^(n/2)-lg-w 


n — 1 


00 < ^ < 00 


Since the area under the curve must be unity for a probability distribu¬ 
tion and because of symmetry, we have 




n — 1 



then 
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where X — 'LXi/n and s = V^2(Xi — X)V(7^ — 1). Then a tabular 
value of ia is obtained for (n — 1) degrees of freedom, and |/'ol is compared 
with to,. If |^o| is greater than there is evidence that the true mean, ju, 
is different from /xo. If |^o| is less than there is no evidence that ju 
is different from ^o- 

Similarly if X is used as an estimate of /x and 

n 

^ (Xi - X)V(n - 1) 

1 = 1 

as an estimate of the two-tailed confidence limits for ju are 

Z - 4 -4" < M < X + 4 4=’ 

V 

at the (1 — a) confidence probability level. 

Values of are given for various values of a and various degrees of 
freedom in Table III in the Appendix. For the location of other tables 
and methods of obtaining exact values, see Bancroft.^ Note that the 
tabular values are for a two-tailed test. If it is desired to use (P(i > U), 
the tabular probabilities must be divided by 2. 

7.11. Test of the Difference between Two Means. Another important 
use of the t distribution is in testing a hypothesis concerning the differ¬ 
ence of two population means. Suppose that we have two random 
samples {Xh j and from populations and respec¬ 

tively, with o-f = 0-2 = (7^ We wish to test the hypothesis that ii\ = /X 2 . 
The procedure in constructing a suitable test criterion will be to (i) find 
a function of the observations whose distribution is independent of o-^ 
and which involves the difference between the population means fxx and 
112 , and (ii) find the sampling distribution of the function on the assump¬ 
tion that the hypothesis tested is true, for example, in this case, = M 2 . 
Let the samples have the following characteristics: 


Sample 

Size 

Mean 

Variance 

{Vid 

til 

X, 


IXd 

tla 

X 2 

4 


Now, (ni — is distributed as chi-square with (ni — 1) degrees of 

freedom, and (^2 — 1)s\/(tI is distributed as chi-square with {712 — 1) 
degrees of freedom. Since cr^ = o-l = 

{ui - l)sf (712 - l)s| _ [(ni - l)sf + (no - 1 ) 52 ] 

_2 ' ' 
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By the reproductive property of chi-square, assuming the two samples 
are independently drawn, this quantity is distributed as chi-square with 
4- _ 2 ) degrees of freedom. But 

^ (mi — l)s? -4- (m 2 — Ifgj 

®- 

is an unbiase_d estimate of_<r^ since its expected value is Also the 
variance of (Xi — mi) — (^2 — M2) is 




1 + i 

ni 71.2 


The quantity 

(Xi - Ml) - - M2) ^ (Xi - X2) - (mi - mj ) 

s VtC^"+ (l/rnu) s 

is the ratio of a normal deviate to the square root of an unbiased estimate 
of its variance. Writing the square of the quantity as 


Aa) ~ (mi — M 2 )?/< 


i ^ 1 


we see that it is now the ratio of the square of a variate which is A^,l 
and a variate which is distributed as xVOi + m2 - 2 ) '"th (ni + m2 ) 
degrees of freedom. But the second property is the definition of a 
quantity distributed as t. Hence, the test criterion on the assump- 
tion that the hypothesis tested (mi = M2) is true is 

tS \/ (l/lll) + (l/'^l2) 

The two-tailed confidence limits for the true difference between the 
two means, mi - M2 = a, a re given by d - U. < 5 < d + where 

d = Xi - A2 and Srf = s V{l/mi) + (IM). , 

If <r| a\, then the variance of (Xi — mi) "" (A 2 M2) is 

Ml ^ n2 


for which 


^ I 

Ml 712 


is an unbiased estimate. Hence the first property of t is met by 
, (Xi - X2) - (mi - M2) 



but not the second, since the square of the denominator is not distributed 
as with V degrees of freedom. 

We know that {ni — is distributed as chi-square with {ui — 1) 

degrees of freedom and similarly for (no — l)s|/o-|. Hence a denominator 
analogous to that of P is 

(ni — QgiAi + (^2 — 1)sI/(tI 
ni -j- 71-2 — 2 

and an analogous numerator is 

[(Xi - X2) - (jUl - M 2 )]^ 

(<^i/ni) + (cr'I/n^ ) 

If we set 

1 = (^1 - ^"2) - (mi - M2) / Kni - l)<'nA|_+ (ng - 
V'Rhi) + iAJ^ ‘ ^ nx^rn^-2 

and let X = (j\/(j\ and I = slAL then 


a = ^ v "(^1 ~ 0 ~ lA/x V (i/^i) H~ (x/tig) 

V^l + ^^2 — 2 V(lAl) + (^^2) 

is an analogous expression to the ordinary t. The exact distribution of 
t' is not known and even if known would be of little practical use since 
it would undoubtedly involve the population parameters trf and (j\. Two 
methods have been devised for using t’ to make the desired test. They 
are described by Bartlettand Welch.® 

Several other methods have been suggested for testing the difference 
between two means when the population variances are unequal. One of 
the oldest methods is the Fisher-Behrens test,^ which is based on Fisher^s 
concept of fiducial prohahility. Sukhatme’^ has prepared tables to be 
used in connection with this test. Since there is considerable controversy 
regarding the validity of this test, it will not be presented here. 

Two approximate tests have beeome quite popular. One by Cochran 
and Cox^ utilizes a weighted mean of the tabular t values for the two sam¬ 
ples, weighted by the two sample variances. Compute d = Xi — Xg 
and 


The approximate tabular value for d = d/sd is 

“ (^^l + IV2) 

where Wi = sj/ui and C, is C for (ui — 1) degrees of freedom. 
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Another approximate test was suggested by Smith^^ and further 
expanded by Satterthwaite. The test criterion is also t', but the approxi¬ 
mate tabular value is ta with / degrees of freedom, where 

gi 

^ ~ (^17'^ i)" _L 

7ii — l n2 — 1 

Dixon and Massey'- have a slightly different value of /: 

gi 

^ ~ {sl/ni Y {sl/n ^y 

ni + 1 rt2 + 1 


A method will be presented in the next section to test whether o-f - cr|, 

using si and s|. . . • i ^ , 

7 . 12 . Distribution of F, Another very useful statistic in making tests 

of hypotheses involves the ratio of two chi-squares. Let xl and xl be 
independently distributed with ni and iio degrees of freedom, respectively. 
Then their simultaneous or joint distribution is 
Kxlxl) d{xl) dixD 


__..(2^1 (~) d(xi) ^(xi)- 

" 4lXni/2)rC^^-72) V2/ \2/ 


Let 


Then 


^ n2X? 9 '^1^x1 

F = —^ or xl = ——- 

riixl ^2 


and 


f{f,X2) - 2r(ni/2)r(m,/2) 

/(F) dF = [ Jy /(F,x|) fKxl)] (IF 
^ (;<7?<.i./2) -1 4- y 


■(I) 


2\ ni + na —2 __ 


X22 




-(Hi+n2)/2 


dF, 0 < F < C» . 


Since ni and are degrees of freedom. 


and 


o n<isl 

xl = ’ 


where si and si are two independent sample estimates of al and o*!, 
respectively. Then 





This function provides a test of the hypothesis erf = Then 



which is the ratio of the independent sample estimates of two assumed 
equal variances. 

A tabular value, is obtained for ni and n2 degrees of freedom, such 
that 

<s>{F > F^) = r m dF = e.. 

Jt< a 

a is the probability that a value of F as large as or larger than could 
have been obtained from two random samples from two normal popula¬ 
tions with the same variances. For a fixed a, Fq is compared with Fa. 
If Fo is greater than F^, this is evidence that o-f > (j\. If Fo is less than 
Fa, there is insufficient evidence to state that erf > a\. It should be 
reemphasized that there is always a chance of being wrong in stating that 
erf > (r| when Fo > F^. If we make a large number of such tests, we 
would expect to be wrong less than lOOo; per cent of the time. If we 
further protect ourselves from such mistakes by decreasing a:, we shall 
unfortunately reduce the number of times real differences are indicated. 
A further discussion of this problem will be taken up in Chap. 11. 

In the above discussions we have been assuming that the only alterna¬ 
tive to o-f = g\ is that erf > <j\. Of course, even if <rf > cr^, the sample Fo 
might be less than 1. Under certain conditions of an applied problem it 
may not be valid to assume that the only alternative to the hypothesis 
erf = (y\ is (j\ > (t\. If a\ may also be less than (y\, then the alternative 
hypothesis is <rf 5^ <t|, and it becomes necessary to consider both tails 
of the F distribution in testing the hypothesis. In this case, we compute 
Fo as the larger mean square divided by the smaller and compare it with 
Fa/2 instead of Fa, if we wish to test at the a significance probability level. 
Or, conversely, we can use Fa as the tabular value and test at the 2 a 
probability level. 

In Sec. 7.11 it was shown that t may be used to test the hypothesis that 
jui = jLt2 on the assumption that erf — c\. Suppose that for the two sam¬ 
ples j-Yiij and {X2i! there is some reason to believe that the population 
variances are not equal but that there is no a priori reason for one to be 
larger than the other. In this case it is necessary to use the two-tailed 
F test in testing for the significance of the difference between the two 
variances. On the other hand, in applying the F test to an analysis-of- 
variance table, to be explained later, the usual alternative hypothesis is 
o-f > gI, and a single-tailed F test is used. 



Tabular values of F.os and F.oi are presented in Table IV in the Appen¬ 
dix for various values of Ui and 712 . Norton has computed values of 
F .20 and Colcord and Deming of F.ooi. All of these tables may be found 
in reference 14. See Bancroft^ for a complete description of where such 
tables can be found. 


EXERCISES 

7.1. (a) Set up the joint probability distribution of the sample of size 
n from a bivariate normal distribution (Xi,X 2 ) with means /xi and /X 2 , 
variances al and al, and correlation p. (5) Show that the simultaneous 
distribution of Xi and X 2 is 


where 


n 

27ro-icr2 \/l — p^ 


dXi dX2 


e = 


2(1 - P^“) 


(A 


- Mi)^ , 

:2 “u 


(Z2 - P2)^ 2 p(Xi - Pi)(X2 - P2) 


< Zi < 


(Tier 2 

00. — 00 


< X 2 < <= 0 . 


7.2. Given a normal population with = 25. A sample of 12 pro¬ 
duced the following values: 24, 30, 26, 33, 21, 24, 20, 32, 24, 30, 33, 34. 

(а) Test the hypothesis that the true mean of the population is 24, 
using the a = .05 significance probability. 

(б) Make the same test as {a) ii a. = .01. 

What is the difference between the two tests? 

(c) Set up a 95 per cent confidence interval for the true mean. 

(d) What is the connection between the test of (a) and the confidence 
interval of (c)? 

7.3. Given the distribution of as in the Sec. 7.5, find the distribution 


_X 

1 + x' 


7.4. Given the two functions 


u = aix? 4- a 2 X 2 . 

V = x\-\- xl> 

Find the distribution of w = ii/v. W^hat effect would the provision 
ai > a 2 > 0 have on the results? 

7.6. If Xi and X 2 arc NID(0,1), show that the variables r and d defined 
as follows, 

Xi = r sin 6, X 2 — r cos 6, 

are independently distributed and that is distributed as x^ with 2 
degrees of freedom. 


7.6. Given the bivariate normal distribution with means zero and 
variances <j\ and <j\ and correlation p. Show that 

« = 1 4 _ ^ 

1 - pM <^? 0-2 <710-2 

is distributed as chi-square with 2 degrees of freedom. Hint: Show that 
the moment generating function of 6 is the same as that of with 2 
degrees of freedom. 

7.7. Using the result in Exercise 7.6, find the probability that 6 < c^. 

7.8. In Exercise 7.6 let <7J = (7| = I and p = 0. Find the equation of 
the circle with center at the origin which would contain 99 per cent of a 
large number of samples of Xi and A^ 2 . 

7.9. A random sample of n individuals {X.-j is drawn from a X(0,1) 

population. Suppose the sample is subdivided into r subclasses with 
two or more in each (ni,n 2 , . . . ,nr being the numbers in the subclasses). 
The sum of squares qi {i = . . . ,r) of deviations from the subclass 


mean is computed for each subclass. What is the distribution of ^ 

2 = 1 

7.10. Compute the 1 per cent and 5 per cent tabular values of x^ 

V = 1, 2, and 4. Check with the results in Table II in the Appendix. 

7.11. Given {s'Y/a'^ = 2 with n = 4. What is the probability of 
obtaining a value this large or larger? Check your results by interpolat¬ 
ing with the tabled values. 

7.12. Use the data in Exercise 7.2 to test the hypothesis that < 7 ^ = 25, 
assuming p = 30 and a = .05. Also set up 95 per cent confidence 
limits for < 72 . 

7.13. t («) Show that the results of Sec. 7.7 can now be extended to 
test hypotheses and to set up confidence limits for <72 without assuming 
p known, where 52 replaces {s'Y. 

(6) Use the data in Example 7.1 to test the hypothesis that <72 = 90, 
when no assumption is made concerning p. Set up 95 per cent confidence 
limits for cr2. 

(c) Repeat Exercise 7.12 when no assumption is made concerning p. 

id) What is the advantage of the use of 52 instead of {s'Y in making 
statements concerning 0 - 2 ? 

7.14. Show that the distribution of $2 approaches normality for large n. 
What are the skewness and kurtosis of § 2 ? 

7.16. Find the moment generating function of log« s, where s‘^ is an 
unbiased estimate of variance. Show that the distribution of loge 5 is 
independent of <72 apart from its mean value. 

t Exercise 7.13 should be worked by everyone. 
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7.16. Given = 2 with n = 5, where ^ {Xi - Xy/{n - 1). 

i = i 

What is the probability of obtaining a value this large or larger? Check 
your results by interpolating with the tabular values. 

7.17. The distribution of t for 1 degree of freedom is the so-called 
‘'Cauchy distribution.” What about its mean and variance? Com¬ 
pute the tabular values of this ^ for a: = .01 and .05, using the two-tailed 

test. . 

7.18. (a) Given a random sample of n paired observations (A and E). 

The sample mean of the A’s is Xa, and that of the 2^ s is Xh. The A s 
and i5’s come from normal populations with the same variance. It is 
desired to test whether the population means are equal.^ The value of 
the Ah members of A and B may be represented, respectively, as 


A,- = Ma "b A- Cat, 
Bi = fXb A Pi A Cbt, 


where iia and iib are the population mean effects of A and B, pi represents 
the effect common to the two paired observations, assumed X(0,o-p. 
The Cai and Cbi represent residual effects for each observation not explained 
by the m’s and pi and independent of them. The eai and eu are assumed 
X(0,cr2). 


If 



and 



Bi) - (Xa - x,)]‘^ 
n — \ 


show that 

(XT - Xb) 

s 


follows the t distribution with (n — 1) degrees of freedom. 

(5) Compare A in (a) with that obtained by pooling the two sample 
sums of squares of deviations. 

7.19, Show that the t distribution approaches normality for large n. 

7.20, The distribution of the correlation coefficient r in samples of n 
from a normal bivariate population with p = 0 is 
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f{r) dr = —7 -^ (1 — dr^ 

Show that 

In - 2 


-1 < r < 1, 


Ml - 


is distributed as t with (n ~ 2) degrees of freedom. 

7.21. Compute the 5 per cent tabular value of t (two-tailed test) for 
2 degrees of freedom. What is the probability that |/1 >3? 

7.22. In a certain test, given to the 45 members of a class in statistics, 
20 women students had an average score of 40 with a variance of 16, 
while 25 men students had an average of 46 with a variance of 16. What 
is the probability of obtaining such results if both groups were equally 
well prepared for the test? What assumptions are made in obtaining 
this probability value? On the basis of this sample and the validity of 
the assumptions is there much evidence that both groups are not equally 
well prepared? What are the 95 per cent confidence limits for the dif¬ 
ference between the two means? 

7.23. Given the following data on the gains (in pounds) by 27 hogs, in 
individual and similar pens, 12 fed ration A and 15 ration B. Test the 
hypothesis that the population mean gains are the same, and set up 
99 per cent confidence limits for the difference between the two means. 


A: 25, 30, 28, 34, 24, 25, 13, 32, 24, 30, 31, 35 

B: 44, 34, 22, 8, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22 


7.24. In a certain school 34 girl beginners in the first grade were selected 
and paired on the basis of IQ, socioeconomic rating of family, general 
health, and family size. One member of each pair had attended one 
year of kindergarten, while the other had not. On a certain first-grade 
readiness test given to all 34 pupils the scores were: 


Kindergarten 65 68 70 63 64 62 73 75 72 78 64 73 79 80 67 74 82 


No kindergarten 63 68 68 60 65 60 72 75 73 70 66 70 77 78 63 74 78 

Is there significant evidence from this investigation that kindergarten is 
of benefit in preparing for the first grade? In making use of the t test 
what null hypothesis is made concerning /x = jui — jU 2 ? How many 
degrees of freedom do you use here? What must be true concerning 
the distribution of the paired differences? Set up 95 per cent confidence 
limits for 

7.25. Cochran^^ presents an example of an experiment with a control 
and six chalk and lime treatments on the number of mangolds per plot. 
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Suppose we compare the control numbers with those having a large 
amount of chalk or lime. The numbers of mangolds per plot were: 


Total 


Control 

140,142, 36,129, 49, 37,114,125 

772 

Treated 

117, 137, 137, 143, 130, 112, 130, 121 

1027 


(a) Calculate the sample mean and variance for each group. 

{h) Assume that the two population variances are equal, and test for 
the difference between the two population means. 

(c) Assume that the two population variances are unequal. Test for 
the difference between the two population means by the Cochran and 
Cox, Smith-Satterthwaite, and Dixon and Massey tests. 

7.26. Show that in Sec. 7.12 

J5(ni/2, 712 / 2 ) 

7.27. Show that 

mdF = 1 . 

7.28. When = 1, show that F = F. 

7.29. Let F{ni,n/} be the distribution of F with n\ degrees of freedom 
for the numerator and 7 x 2 for the denominator, {a) Derive the distribu¬ 
tion of 

F' = 

(h) Hence sho^v that 


(P 


F{n2,n.i) < Y 


= (9[F{ni,n2) > X]. 


(c) Use this result to show why we can use Fa as a tabular value to 
test at the 2a probability level for a two-tailed F test. 

7.30. Show that (P(F > Fa) is an incomplete Beta function. Set up 

, . ^ T X 7liF/n2 

this function. Hint: i^et A = z ——7— wi—x 

1 + (niF/n2) 

7.31. Use the result in Exercise 7.30 to determine a general formula 
for fXi of F, 

7.32. Determine the 5 per cent tabular value for F if (a) rii = 2,7X2 = 4, 
and (b) ni = 4, 712 — 4. 

7.33. What is (P(F > 4) for m = 4, ^2 = 4? 

7.34. Given si = 40 and si = 10, each based on 4 degrees of freedom. 
Determine the probability that sample variances as divergent as these 
could be estimates of the sample population variance with an alternative 
hypothesis that al ^ or|. 






yu 


7.35. Given that one mean square, 80, with 4 degrees of freedom, is an 
estimate of al + Scr^, while another mean square, 10, with 20 degrees of 
freedom, is an estimate of cr^. Use an F table to test the hypothesis that 
erf = 0. 

7.36. Test the hypothesis that the two population variances in Exer¬ 
cise 7.25 were equal. 

7.37. K. A. Fisher first considered the problem of the ratio of two 
variances as the difference between the logarithms of the two variances. 
If we let 

2 i loge F, 


we obtain his z distribution, which is more nearly normal than the F 
distribution. Show that 



— 00 < < 00 . 


7.38. Show that/(^) is symmetrical if ni = n 2 . 
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CHAPTER 8 


INTRODUCTION TO POINT ESTIMATION AND CRITERIA 
OF “GOODNESS” 

8.1. Introduction. A worker in animal husbandry may wish to 
estimate the mean gain in weight of swine, of the same breed, sex, and 
age, fed on the same ration and managed similarly for a period of 20 
days. To this end the following sample set of gains in pounds was 
obtained: 27, 28, 28, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 32, 32, 33. Now, 
if it be assumed that these observations are a random sample from a 
population of such gains which are normally distributed with mean, 
II, then we must decide what function of the sample observations must be 
used to obtain a ‘^good” or “best” estimate of g. We must, of course, 
define “ good ” or “ best ” in some technical sense. As will be shown later, 
it turns out that the sample mean, X = 30, is a “good” estimate of g. 
Our investigation concerned itself then with drawing a sample from some 
population of specified mathematical form for the purpose of estimating 
a parameter of the population. In subsequent discussions the function 
of the observations chosen to estimate the population parameter, in 
this case X, will be called an estimator, while the particular numerical 
value obtained in an application will be called an estimate, 

8.2. The Problem of Point Estimation. A sample Xi, X 2 , . . . X,^ is 
specified as having been drawn from a common population with distribu¬ 
tion function/(X; 0 1,^2, . . . where X is the variate and 01,^2 . . • , 
Bk are population parameters. We wish to find functions of the observa¬ 
tions, say 0i(Xi,X2, . . . ,Xn), h (Xi,X 2 , . . . ,Xn), . . . , 0 a.(Xi,X 2 , 
. , . . ,Xn), such that the distribution of these functions will be concen¬ 
trated closely about the respective true values of the parameters. By 
saying that the distribution of 0;(Xi,X2, . . . ,X„) will be concentrated 
closely about the true value we can mean one of several conditions, such as 

(а) The probability that the estimator falls within a short distance of 
the true value shall be large regardless of the fact that this requirement 
may be satisfied only by an estimator which is distributed in such a way 
that there is a possibility (though small) of a very large deviation from 
the parameter. 

(б) The probability that the estimator falls more than a specified dis¬ 
tance from the parameter shall be negligible, or even zero, regardless of 
how the estimator is distributed inside this region, 

91 




92 


BASIC STATISTICAL THEORY 


(c) Or we may be willing to have the estimator deviate in one direction 
from the parameter but may wish to minimize the change of large devia¬ 
tions in the other direction. 

These three conditions may be represented by estimators which are dis¬ 
tributed as shown in Fig. 8.1 (0 is the parameter). 

There are other considerations which might influence one in deciding 
whether or not a given estimator is a ^^good” one. For example, is the 
estimator appropriate for small samples as well as large? It also might 
be possible to set down a cost of a given deviation of an estimator from 
the parameter, the cost presumably being an increasing function of the 
size of the deviation. Then, if we could determine this cost function for 


/le) 



various proposed estimators, it would be possible to estimate the total 
cost for each and adopt that estimator which produced a minimum cost. 

Most of these problems of estimation are resolved if all the estimators 
are normally distributed. In this case, the best estimator can reasonably 
be assumed to be that one which has the minimum variance. However, 
it is difficult to say which estimator is superior if one is distributed nor¬ 
mally and the other uniformly, for example. Although most common 
estimators are asymptotically normally distributed (n large), few are 
normal for small samples and many are far from normal in this case. 
On the whole we can advance only a few guiding principles to be used in 
deciding whether an estimator is ‘‘good” or not. While these principles 
have yielded fruitful results, much work remains to be done, especially 
regarding nonnormally distributed estimators and nonrandom samples. 

In the following sections we shall define certain characteristics of a 
“good” estimator and introduce a method which sometimes yields an 
estimator which satisfies all of these characteristics. Much of the phi¬ 
losophy and theory of estimation are a result of the work of R. A. Fisher 
(see reference 1). 
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8.3. Unbiasedness. An estimator B is said to be unbiased if E{B) = B. 
It is said to be positively or negatively biased, respectively, according as 
to whether E{B) ^ B. Unbiasedness is a desirable but not necessarily an 
indispensable property of a ^^good^’ estimator. If the amount of bias is 
small compared with the standard deviation of the estimator, the esti¬ 
mator though biased may be entirely satisfactory. 

8.4. Consistency. An estimator B is said to be a consistent estimator 
oi B if B converges stochastically to 0 as n approaches oo. Symbolically 
B will converge stochastically to 0 as n approaches oc , if for two arbitrarily 


/(X) 



small positive numbers, e and r/, a large enough sample can be taken so 
that 

(P[|0 - B\ > e] <r]. 

A useful relation for proving consistency is furnished by Tchebysheffs 
inequality: Given a random variate X with distribution function /(A), 
mean ju, and variance assumed to exist, then for a given 5(>0) 

(P[|X - mI > So-] < I- 

The inequality may be established from Fig. 8 . 2 . It follows that 
= jy (X - m)“/(X) dX 

= (X - m)V(X) dX + (X - ^mx) dX 

+ f I (X - M)y(X) dX. 

JpL + Sa 

Also, since the second integral is positive, 

(X - nYfiX) dX + f" (X - M)!f(X) dX. 

Since, in the range of integration, \X — fx\ > da, the factor (X — i^ay 
be replaced by and 

^2 > 52^2 dX + f " /(X) dX. 

J — <x> J fi-jr 5a- 
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It is now evident that 


or 


i > (P(|X - Ml > S<r), 
(P[1X - m! > M < i. 


It should be noted that this inequality holds for any distribution but is 
not very efficient for a normal distribution. For example, if X is N{iJL,a^), 
then the exact probability, a, for 5 = 2 is .05, while Tchebyscheff’s 
inequality gives « < .25. 

In the limit a consistent estimator will necessarily be unbiased although 
for finite sample sizes the consistent estimator may be biased. An 
unbiased estimator may or may not be consistent. It can be proved 
(Cramer “) that an unbiased estimator will be consistent if 0 as n —> oo , 

Note that consistency is a large sample criterion, while bias is applied to 
small samples as well. 

There are usually many consistent estimators of d; hence, other criteria 
are needed to select the ^^Dest’^ from among those with the property of 
consistency. In general, consistency is a desirable property of an 
estimator. 

Example 8.1. Since E(X) — fi, the sample mean, X, is an unbiased 
estimator of the population mean, jj.. Also, if exists, = cr^/n and 
then, by Tchebysheff’s inequality. 


(P 


|Z - Ml > 


8(7 

\^n 



Now, if we set e = 8<j/-\/n and r] = l/5“, then X will converge stochasti¬ 
cally to yu, if we choose n > 5 V Yen Hence X is a consistent estimator for n. 

Example 8.2. It can be shown that the distribution of the median, 
m, for a random sample of n from N{ij.,(t‘^) approaches N{}x,al^) for 7i large, 
with 0 -^ = Tr<T‘^l2'n. Hence, for large n, m may be taken as unbiased and 
is also a consistent estimator of g for normally distributed data. [It can 
be shown in general that o-^ = \/4:np{7n) if/(m) 0. is the asymptotic 

variance.] 

8.5. Efficiency. Since there are usually many consistent estimators 
of a given parameter 0, we require some additional criterion to use in 
deciding wffiich of these consistent estimators is the “best.” Also, there 
might be cases in which it is desirable to use an estimator for which the 
limiting value deviates by a small amount from the parameter, that is, an 
inconsistent estimator, if such an estimator were superior in other 
respects. Another criterion advanced by R. A. Fisher is that the csti- 
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mator shall have a minimum variance in large samples. An estimator 
possessing this property is said to be efficient. 

Definition. 6 is an efficienit estimator of d (i) if ^/n {§— 6) approaches 
as (ii) for any other estimator B' for which -s/n (6' — 0) 

approaches o-'^ > 

The efficiency of B' is given hy Ef = crV<r'^.t We shall consider later 
an estimation method which provides an estimator with minimum 
variance. 

Example 8.3. Consider two consistent estimators of p furnished by a 
random sample of 7 i taken from that is, the mean X, and the 

median, m. We know that = crV^ and also for large samples that 
0 - 2 ^ = Tra‘^/2n. The efficiency of the median relative to the mean is given 


hy 


Ef 


(T^^fn 

Tr(j‘^/2n 


.64. 


Example 8.4. Consider the mid-range, MR, as an estimator of 
/X. MR is the average of the largest X and the smallest X in the sample. 
It can be shown that if the n members of the sample {X,;j arc 
then X(MR) = p and 

2 I n{ ^ \ 

O'MR 24 log n ^ \log“ n/ 


Hence, the efficiency of MR using the first term only of 
the mean, Z, is given by 


Ef 


(T-fn 

7rVV24 log n 


relatively to 


But, lim Ef — 0. Hence, although MR is an unbiased and consistent 

n—> «« 

estimator for p, it is decidedly inefficient in large samples as compared 
with the mean, Z. 

Example 8.5. Consider the efficiencies of Z, m, and MR as estimators 
of p from a sample of n from the rectangular distribution :/(Z) = 1/w, 
0 < X < w, 10 < oo,ju== w/2, andcT^ = w^/l2. All three estimators are 
unbiased. The variances are 


2 _ So-' 2 __6^1_ 

n 12n n' (n + i)(n + 2)‘ 

Hence each estimator is consistent. If MR were asymptotically normally 
distributed, we could infer that the efficiency, Ef, of the mean or the 
median relative to mid-range would be zero. However, MR is not so 

t We shall use the symbol Efio represent efficiency to distinguish it from E, which 
stands for expected value of. 
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distributed: hence, we cannot use the critei’ion of efficiency in making such 
a comparison. The efficiency of m relative to X is I-. 

8.6. Sufficiency. Another criterion which is useful for small samples 
is sufficiency. 

Definition. Q is said to he a suffiicient estimator for estimating 6 if the 
joint distribution of the sample jX^} of n observations may be put in the form 

n 

n /(V; 0 ) = 9 {Xi,x„ . . . ,x„\e) ■ h{i-,e), 

i = l 

where g{Xi,Xo, . . . ,Xn|^) does not involve 6. 

In words, it must be possible to subdivide the joint distribution of the 
sample into the conditional distribution of (Xi,X 2 , . . . ,Xn) given d 
multiplied by the joint distribution of 6 and d in such a way that the condi¬ 
tional distribution does not involve B. 

In the factored form it is easy to see that no other estimator, say 9', can 
provide any information about 6 . For the distribution of 6 ' for a fixed 6 
will be determined by g{Xi,X 2 , . . . ,Xn\B) and will involve 9 but not 9. 
Hence 9' will provide information about 9 but not 9. But, for any given 
problem, 9 is known, so that this information is of no value. In the 
language of R. A. Fisher, a sufficient estimator is one which exhausts all 
of the information in the sample. 

It should be emphasized that sufficiency is not an asymptotic property, 
it does not require that n be increased without limit or that 9 be dis¬ 
tributed normally for large n. In view of the above discussion it would 
appear appropriate to consider a sufficient estimator as the best’'esti¬ 
mator. Certainly if an estimator of 9 is not a function of By it can be of no 
use in estimating 9. Conversely, if an estimator exhausts all the informa¬ 
tion in a sample, it seems useless to consider any other estimator. Unfor¬ 
tunately there is no sufficient estimator for many parameters. 

8.7. Amount of Information and a Measure of Efficiency for Small 
Samples. The minimum variance for 9 is given by 



where f{X;9) has been abbreviated to/. This minimum variance can be 
achieved only if 9 is an unbiased sufficient estimator of 9 and if 

where k may be a function of 9 but is independent of 0, and h{9'y9) > 0. 
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R. A. Fisher has designated the reciprocal of o-g. the amount of infor¬ 
mation in the sample of n, that is, 

Now, if 6 is the estimator with minimum variance, then the efficiency of 
any other estimator will be measured by 


If the estimator 6 is asymptotically normally distributed so that 


then 


log lh{d;d)] = constant — 


(d - ey^ 
2<yf ' 


b log h __ 6 — 6 
~~dd 


Hence, in this case the second requirement for minimum variance is met 
with k = provided cr^“ exists. 

8.8. Other Forms of I. We can write the integrand of I in several 
additional ways. Since 

b l og / 

de j de’ 


the integrand may be written as 



Also, 

bd^ f bo’^ p \be) 

Hence 

But the last integral on the right is zero; hence 



Example 8.6. Consider a random sample of n from N(^c,o'^). The 
joint frequency distribution is 
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1 L/xiV 1 ii/Ox X iFX X 


log [/(X ;m)] = X log m - m ~ log X!, 

00 

I =n 'l 


fix ;m) = 


x = o 


m‘ 
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Hence the minimum variance is m/n, which is the variance of X, 


EXERCISES 

8.1, Show that = S(X — Xy/n is a biased estimator for 

8.2, Given a random sample of n from X(0,o-^). 

(a) Show that s- = XX^/n is an unbiased estimator for cr^. Hint: 

(b) Show that = 2 (ryn, Hint: 

a;. = e[{8^ - cr2)2] = E[(s^y] - (a^y. 

Use ^[(s^)^] = ((7yn^)E[{x^y] to complete the demonstration. 

(c) Is also consistent? Hint: Use the results of (6) and Tcheby- 
sheff’s inequality. 

(d) Using the concept of information, /, show that the minimum vari¬ 
ance of is 2o-Vn. Is <s‘- efficient in small samples? 

(<?) Is s‘^ a sufficient estimator for (T^? 

8.3, (a) Show that h{X\7n) for a sample from a Poisson distribution 
meets the conditions given for minimum variance. Hint : From Example 

S.7,h(X]m) = Now, show that m ~ 

(b) Is X consistent in this case? 

8.4, Suppose N experiments consisting of n trials each are conducted 
with data assumed binomially distributed with constant probability p. 
That is, 

f(X;v) = C{n,X)p^q- -^', 0 < X < n. 

We estimate p by X = XX/Nti. Discuss this estimator as to: 

(a) Sufficiency, by showing that h(X;p) = 

(b) Bias. 

(c) Information, by showing / = Nn/pq. 

(d) Consistency, by using dl = pq/Nii and Tchebysheff’s inequality. 

8.5, Given a sample of n taken from the Cauchy distribution 

/(X;m) = 

(а) It can be shown that X has the same distribution as X. Hence 
what can be said about the usefulness of X" as an estimate of m in this case? 

(б) Using the concept of information, show that the minimum variance 
of A is 2/n. 



(c) Show that the asymptotic variance of the median for this distribu¬ 
tion is x^/4n. What is its asymptotic efficiency? 

(d) Is the median consistent in this case? 

8 . 6 . R. A. Fisher has shown that 



where = 2(X — XY/{n — 1) and the sample is from an arbitrary 
universe. Show that s“ is a consistent estimator for 
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CHAPTER 9 


PRINCIPLES OF POINT ESTIMATION: 

MAXIMUM LIKELIHOOD 

9.1. Introduction. Several principles of estimation, leading to routine 
mathematical procedures, have been proposed for obtaining ^^good’^ 
estimators. These include the principle of mom.enU, minimum chi-square, 
the method of least squares, and the principle of maximum likelihood. 
The application of these principles in particular cases will lead to esti¬ 
mators which may differ and hence possess different attributes of ^'good¬ 
ness.’’ A principle much in use, yielding estimators with many desirable 
attributes of "goodness” obtained by easily applied routine mathematical 
procedures, is that of maximum likelihood devised by R. A. Fisher. In 
this chapter we shall discuss this important principle of estimation in 
detail. 

9.2. Basic Theory. The procedure for determining the maximum- 
likelihood (ML) estimator for a population parameter 6 is as follows: 

(а) Determine the distribution function of the sample,/(X i,A" 2 , • • • , 

Xn;d). This is the probability of obtaining the particular sample for 
discrete variates and is this probability without the differentials for con¬ 
tinuous variates. R. A. Fisher has called this the likelihood of , the 
sample. 

(б) Determine L = log/(Xi,X 2 , . • . 

(c) Determine the value of 6 which will maximize L by solving the 
equation 


This will also maximize the likelihood. From the previous chaptei we 
recall that a sufficient estimator for B exists when the joint distribution 
of the sample may be put in the form 

/(Xi,X2, . . . ,Xn]B) = ^(Xi,X2, . . . ,Xn\B)di{B]e), 

where ^(Xi,X2, . . . ,Xn)l 0 ) does not involve 0. Hence the ML equation 
reduces to 

d log [hid;e)] _ « 
dd 

lOl 
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But any solution of this equation can depend only on 6 ; hence, this ML 
solution must be the sufficient estimator, as it depends on no other 
estimator. 

9.3. Maximum-likelihood and Efficient Estimators in Small Samples. 

If an efficient estimator for small samples exists (has minimum variance), 
the ML estimator, adjusted for bias if necessary, will be efficient. 

In order that 6 have minimum variance, we know from the previous 
chapter that 6 must be an unbiased sufficient estimator and also that 

. m - ,). 

But to obtain the ML estimator we set this, or what amounts to the same 
thing, 

d log [h{d',d)] ^ 
dd 

Hence, 6 , the unbiased ML estimator, is an efficient estimator in small 
samples. It follows from the previous chapter that the minimum vari¬ 
ance that an unbiased ML estimator can have is 1/1(6). If the ML 
estimator, 6 , is biased in such a way that E(d) = 6 -j- h( 6 ), the minimum 
variance is 




If the ML estimator is biased, we may adjust it for bias and the minimum 
variance of the adjusted 6 will be 1 / 1 ( 6 ). 

9.4. Maximum-likelihood Estimators for Two or More Parameters. 
If there are h parameters for which estimators are desired, we solve the 
set of h equations 

§ = 0, i=l,2, . . . ,h- 


In the case of two parameters, di and 62 , Cramer^ has shown that the 
minimum variance for 6 i is 

1 1 
1 - P * HdiY 

where 




The joint efficiency in small samples is given by Cramer* as 

1 1 

(1 - p^r ■ [{eoiio^WA 
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The asymptotic efficiency Ej is given by the limiting value of this joint 
small-sample efficiency as n-^ go , Similar formulas may be derived 
for h > 2. 

9.5. Examples Using the Principle of Maximum Likelihood 

Example 9.1. Consider a random sample of n for N(fjL,a^). In this 
case 


L = \ogfiX^,X,, . . , 


2 (V - 

n(log 2 x + log 0-2) + -- 


2(r' 


Let us study the following three cases: 
(a) cr^ known 


djj 

dM 

and 

(6) M known 


fi 


= 0: J (V - A) = 0, A = = X, 

i — \ 


(Jx’^ = — (the minimum variance). 


y (Ah - m)’ 

- n- 


(0-^)2 




1<T 


2 (V - 


<r ^2 = — (also the minimum variance). 


2 

Note that 


n — 1 


small samples since 


(c) Botk iJL and unknown 


in this case is not completely efficient in 

2(7^ 


crs2 


n — 1 


fl ^X, 0-2 = 


2 (V - X)^ 

i = 1 








In case c, is biased, and we would use as the unbiased estimator for 
<7^. It has previously been shown that X and are independently dis¬ 
tributed. The two estimators X and are sufficient since 

/(Xi,X2, . . . = ^(Xi,X2 , . . . ,X.) •/ii(X;m,o-2) • 

where ^(Xi,X 2 , . . . ,Xn) is independent of ju and (in this case, also 
independent of X and s^). 

We see that X is still efficient in small samples with(7x“ = How¬ 

ever, is not quite efficient, since crs 2 “ = 2(7'V(n — 1) and the minimum 
variance in small samples is still 2(T^/n (X and are independently dis¬ 
tributed). Hence the efficiency of is (n — l)/n, which approaches 
1 as > 00 . Thejom^ small-sample efficiency is also (n — l)/n. 

To show that the minimum variance of is l//(o-^), consider the 
following: 

E(^) = (n - 1 ) - = ^2 _ 

^ ^ n n 

Then — (a^/n) is of the form d + 6((9), with dh{B)/dB = — l/n. But 
§2 = na'^jin — 1), and hence the minimum variance of is 


(‘ - s) 




Example 9.2. Given V random samples consisting of n trials each 
from a population assumed binomially distributed with constant proba¬ 
bility, p, so that 

= C'(n,;i:)p*(l - p)”-*. 


. . . ,x,r,p) = n C{n,Xi)p^i-''“{l - p)^i'-i<’* 


iV iV iV 

L = ^ log C{n,Xi) + 2 log p -h ^ (n - Xi) log (1 - p). 


Hence 


= 0 : p 


^ (n — Xi) + ^ ^ 




EXERCISES 

9.1. Determine the ML estimator for m in the Poisson distribution. 

9.2. Given the Pearson Type III distribution 

/(X;a) = 0 < X < 00 , 

where X is some fixed positive constant, (a) Show that the ML estimator 
for a for a sample of n is a = X/\. (6) It can be shown that <r| = (x^/nX. 

Using the concept of information, show that the minimum variance is also 
given by a^InX. (c) Is the estimator consistent? (d) Is the estimator 
efficient in small samples? (e) Show that E{X) = aX, hence that 
E{ci) = a and d is unbiased. (/) Show that 

n 

J~[ Xi^~'^ ’ 

f{Xi,X2, . . . ,Xn]oi) — [r(x)]^~^ 

and hence dis a, sufficient estimator for a. 

9.3. Derive for random samples of n the ML estimators of ju, and 

in the regression equation Y — n e, where e is .V(0,a-^). (a) What 

are the variances of A and /3? (6) Are these efficient? (c) Is the ML 

estimator for unbiased? 

9.4. Given the bivariate normal distribution with parameters: My, 

0 - 2 , (tI, pxy. (a) Find the ML estimators for each parameter for random 

samples of n. (b) Show that (pY = 0Y ■ where d is obtained from 

Exercise 9.3. (c) Is there any difference between <j\ above and the of 

Exercise 9.3? 
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INTERVAL ESTIMATION 

10.1. Introduction. In Sec. 6.1 we introduced a method of estimating 
the value of a parameter, 0 , which is less restrictive than that of point 
estimation. This method, called interval estimation, derives limits Ci and 
C2 which are functions of the sample values jXi} or functions of the sample 
values and known population parameters. We recall that the interval 
(c 1,0*2) is called the confidence interval, which is determined so that, in 
repeated sampling from the same population, the interval will contain the 
parameter 0 a certain percentage of the time. Symbolically Ci and 02 are 
determined so that 

(P(ci < 0 < 0*2) > 1 — ce, 

that is, the probability of the interval (01,02) containing the population 
parameter 0 is greater than or equal to 1 — ce. The value a is a small 
positive number less than 1 . The probability holds only for a large num¬ 
ber of similarly drawn samples, where Ci and 02 are recalculated for each 
sample. The value (1 — a) is called the confidence 'probability. In 
Chap. 7 methods and examples Avere given introducing confidence-interval 
construction in connection with the derivations of certain important 
sampling distributions. 

These concepts were introduced by Neyman.^ R. A. Fisher uses the 
terms fiducial interval and fiducial probability to indicate substantially the 
same concepts, though he restricts his results to sufficient statistics. 

10.2, Intervals for Sufficient Estimators. If ^ is a sufficient estimator 
such that 

n 

0 /(X.;S) = g{X^,X,, . . . ,Xn\S)-h{§-,0), 
i=l 

then the problem of estimating the confidence limits becomes one of find¬ 
ing the limits yi( 0 ) and y2i0) such that 

If = 1 - 

It follows that 

(P[ 7 i( 0 ) < 0 < 72(0)] = 1-0:. 

We then solve the eciuations 71 ( 0 ) = 0 and 72(^) = 0 for 0 and obtain the 
solutions C2 and ci, respectively. 
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Let us consider the problem graphically. In Fig. 10.1 the curves yi{B) 
and 72(^) are drawn so that (?lyi( 0 ) < $ < 72(0)] == I — a is 

continuous, and (P > 1 — a ii f(Xi;d) is discrete. Let Ci and C 2 be the 
intersections of the straight line 6 = Bq with the curves 72(0) and jiiB), 
respectively (note the interchange of subscripts). 6 q is the particular 
value of B obtained from the sample. The line segment (ci,C2) will 



Fig. 10.1. Graphical representation of confidence interval. 

intersect the line B = Bq (the true value of the parameter) only if B falls 
between 710 and 720. But the probability of the latter event is 

(P(7io < B < 720) = 1—0!. 

Hence, i — a is also the probability that the variable interval (ci,C2) 
includes 0o. 

We may summarize as follows: 

1 — a ~ (P(7io < ^ < 72 o 1 ^ = ^0) = (P(clC 2 intersects B = Bq) 

= (P(Ci < Bq < C2). 

This does not imply that 6 has a disti'ibution or that on a given trial there 
is a probability that the true value ^0 lies between Ci and C2. What is 
meant is that, if a series of trials are made, in about 100(1 — a) per cent 
of these trials the variable interval (ci,C2) will include Bq, the true value of B, 
10.3. Shortest Confidence Interval. It is clear that there are an 
infinity of possible limits (71,72) such that CP(7i < ^ < 72) = 1 — a. 
For example, we might take 71 = — 00 so that (P (0 > 72) = a, or we 
might take (P(^ > 72) = (PiB < 71) = a/ 2 .t In determining which of 
the infinity of possible limits (71,72) to use, we shall usually wish to make 
the confidence interval as small as possible. If we consider only unbiased 
estimates which are asymptotically normally distributed, this interval 

t It will be convenient, in general, to let (P (0 > 72) = 0:2 and (P 0 < 71 ) = ai, 
where ai -p a2 = a. 







kjx j:ljl itoj. 1 iimxjjhJ 


can be made as small as possible by choosing the 9 with the smallest 
variance and selecting the limits (71,72) such that 

a 

ai = a 2 = o- 


The ML estimator possesses this desired property. For detailed dis¬ 
cussions of this topic, see Neyman- and Wilks.^ 

10.4. More than One Unknown Parameter. If the frequency dis¬ 
tribution is a function of several unknown parameters {di], i = 1 , 
2 , h; there is a confidence interval for one of these parameters, sa}" 

di, if a function of the sample (Xi,X2, . . . ,Xn) and 9 i, 4>(Xi,X2, 
. . . ,Xn', 9 i), can be found such that 

P7((^>) d<t, = I - a, 

where <^i and (/)2 are numerical values of = 0(Xi,X2, . . . ,Xn;^i) and 0 
is independent of the other {}. In this case 

(P[0i < 0(Xi,X2, . , . ,X„;^i) < 02 ] = 1 - a. 

By reversing the inequalities <t> < <l>2 and 4 ) > 4 >i we can find values of Ci 
and C2 such that 

(P[Ci < di < C2] = I — a, 


where ci and C2 are functions of (Xi,X2, . . . ,Xn) and also 02 and 4 >i, 
respectively. 

The problem of finding a function 0 which is independent of the other 
parameters is often quite difficult. The use of the usual t, used in t tests, 
to solve the Behrens-Fisher problem is an example of this. Hotelling 
introduced the term nuisance parameters to apply to these other parame¬ 
ters which appear in the distribution of the statistic but which we wish to 
eliminate when making statements concerning confidence limits for one of 
the parameters. 

Example 10.1. It is desired to estimate the 95 per cent confidence 
interval {a = . 05 ) for the mean of a population by use of a ran¬ 

dom sample of n. We consider the following cases: 


known. The ML estimation of u 


is X = 2 


Since X is 


N{ijl, <T^/7i), it is evident that the shortest interval will be obtained by 


letting oLi = 0:2 = . 025 j that is, 
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This is true because of the concentration of the probability, or area under 
the curve, about ju for the symmetrical normal distribution. 

Now, the integral on the right above becomes 

■kItN, f” dX = . 025 . 

\ ZTTcr^ Jy^ 

Let T == \/n {X — and the integral becomes 
f” N( 0 , 1 ) dT = . 025 . 

J \/n{y 2 — n) 


The value of the lower limit of this integral may be obtained from a table 
of areas for the normal curve. It will be found that 


= 1 . 96 , 


and hence, 


Similarly, 


It follows that 


72 = M + 


7 l = JLt 


1.96g - 

\/n 

1.96o- 

\/n 


(P 


1.96o- ^ 1.96<r 

M-- < A < M H- 7 ^ , 

\/n ‘s/n J 


Reversing the positions of n and X, we have 


(P 


1.96<r ^ v i 1.96<7 

< /X < A + 


\^n 


\/n 


.95. 


.95. 


Note that Ci = A — ( 1 . 96 <T/\/r?') corresponds to 72 = m "f ( 1 . 96 (r/-s/n), 
and similarly for C2 and 71; that is, if 


Y ^ . , L 96 (r 

A < 72 = M H- 




then 


> A 


L96cr 

■\/n 


n 


= Cl. 


Unless the nuisance parameter cr is known, these confidence limits are of 
little use. This result is the same as that obtained in Sec. 7 . 2 . 

( 5 ) 0-2 unknown. In this case we are concerned with two unknown 
parameters, /x and It is known that t ~ \/n (A — /x)/s is distributed 



as Student’s t with (n — 1) degrees of freedom. Hence t is a function of 
only ju and the sample values n, X, and s. It is possible then to find 
numerical values ti and t 2 such that 

(P[^1 < t < to] = .95. 

Since t has a symmetrical distribution with its maximum density in the 
center, the shortest confidence interval with a = .05 will result from 
setting 

(Pit > t^) = (Pit < ti) = .025. 


In this case ti = — ^ 2 . Since the values of ti and to depend on n, we cannot 
find unique confidence limits as in (a) above. For n = 4 (3 degrees of 
freedom), ti = t 2 = 3.182, and for n = 20 (19 degrees of freedom), 
ti = t2 = 2.093. 

The reverse limits are obtained as follows: For t = \/n iX — }i)/s < to, 
fjL > X — t 2 s/\^n, and similarly for t > h we obtain g < X + tis/\/n. 
Hence 


X — ^2'S . 

V n ^ \/n 


= .95, 


where .95 is the confidence probability. The confidence limits arc now 
independent of the nuisance parameter a since t was used instead of 7', as 
in Sec. 7.10. 

Example 10.2. In Example 10.1(5), determine a 95 per cent con¬ 
fidence interval for aK We know that x" = in — 1)6*Ver- is distributed 
as chi-square with (n — 1) degrees of freedom. Let xf and xl be values 
of x^ such that 

£’/(x^) dix^) = .95. 

Then 


(P[x? < < xl] = .95, 


and for x^ = (^ — 1)6V<^^ < xl; > (^ — l)sVx 2 - Similaily, for x^ > 
Xi, cr^ < (n — 1)6Yxi- Hence, the confidence interval for is 




2 


As for Example 10.1(6), the values of x? and xl depend on n (see Exercise 
7.13). 

In this case the problem of selecting values of x? and xl ia order to 
obtain the shortest confidence interval is more complicated because of the 
skewness of the x^ distribution. It is clear that the interval is propor¬ 
tional to 
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Let us illustrate the difficulty in selecting values of x! and xl in order to 
obtain the shortest confidence interval for the case n = 3 (2 degrees of 
frtiedom). We have 




(a) If we select xl and xl so that ai = aj = .025, then xl 7.38, and 
xl = .05066. Hence 

1 _ J, = 19.8 - 0.1 = 19.7, 
x! xl 


and the confidence interval is 0.2.s= < a'- < 39.6s*. ^ 

(6) If we select x? and xl so that on = .05 and as - 0, then xs 

and Xl = -1026. Hence 


00 


The confidence interval now becomes 

0 < <r* < 19.4s*, 

which is much shorter than that obtained in (a) above.^ 

(c) By minimizing 1/x'i “ 1/xl subject to the relation 

f*f/(x*) flix-) = 1 - 

it is possible to show that the values x? and xl should satisfy the following 
relationship in order to provide the shortest confidence interval for a ; 

xl/(x!) == xl/(xl). 


EXERCISES 


10 1 Given a sample of n from a binomial population with constant 
prolmbility p. If we let = a-, = .025 and assume that the sample 
estimate of p, x, is approximately N{p, pO - p)/n), show that the con¬ 
fidence interval is (pi < P < P2), "'here pi and p-, are solutions of the 
quadratic equation 

n{x - vY = - ?^')- 


10.2. Find the shortest confidence interval for the difference between 
the means of two normal populations with the same variance, a", with a 
sample of ni from the first population and of from the second population. 




10.3. (a) Show that the confidence interval for the ratio of the variance 
of two normal populations 

d = (jl/al 

is given by 

F0/F2 < e < Fo/Fi, 

where 


dF = 


and Fo = -sfAl with (ni — 1) and (712 — 1) degrees of freedom, respec¬ 
tively, 

(6) Given ni ~ 12 and = 25 and a = .10, determine values of Fi 
and F 2 if cii = a 2 . Suppose = 20 and s| = 10, what are the 90 per 
cent confidence limits? 

10.4. Use the data in Exercise 7.35 to set up 90 per cent confidence 
limits for the following ratios, used in statistical genetics: 


S' 


10.5. (For students who have studied advanced calculus.) Derive 
condition c of Example 10.2. 

10.6. Using condition c, derive the shortest confidence interval for 
Exercise 7.13(c). 

10.7. (a) Derive the general confidence limits for a linear function of 
the sample values, that is, 

n 

I — y aiXi, 


where the X^s are NID(m,(t2). 

(b) Apply your result to li and h in Exercise 6.8, assuming r = 4, 
To = 50, Ti = 105, T 2 = 95, and Tz = 70, and = 12. 
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CHAPTER 11 

TESTS OF HYPOTHESES 


11.1. Introduction. Statistical inference concerns itself in general 
with two types of problems: estimation of populatioyi parameters and tests 
of hypotheses. In the preceding three chapters, we have considered prob¬ 
lems of estimation. Desirable properties of a “good^' estimator were 
discussed; and a principle of estimation, the maximum-likelihood method, 
was presented as a technique which in many cases may be easily used to 
obtain estimators possessing many of these desirable properties. 

In this chapter, we propose to discuss the general problem of tests of 
hypotheses and present a principle, the likelihood-ratio criterion, which in 
many cases will provide a ‘'good” test criterion to be used in testing hypo¬ 
theses concerning population parameters. In discussing derived sam¬ 
pling distributions in Chap. 7, it will be recalled that the distributions of 
t, and F were obtained and their uses in applied statistics in testing 
hypotheses were pointed out and illustrated. In this discussion we wish 
to investigate the theoretical justification for selecting a particular test 
criterion for a particular problem in hand. 

11.2. The General Problem. In order to test whether a given hypothe¬ 
sis (Ho, the null hypothesis) is supported by a given set of data, we must 
devise a rule of procedure, depending on the outcome of calculations 
obtained from the sample, to decide whether to reject or not to reject Ho. 
For example, in testing whether or not a given sample supports the hypo¬ 
thesis that the observations were randomly selected from H(0,1) we 
calculate T — X and consider it a normal deviate with unit variance. 
After a choice of an allowable error or probability of rejecting Ho when 
it is true, say a, we find two regions such that, if X is in one region, we 
reject 7/o and, if in the other region, we do not reject Ho. The first 
region will be called the region of rejection, R, and is defined so that the 
probability of the sample falling in R is o:, if Ho is true. Designate T the 
test criterion and a the significance level. 

As indicated in Sec. 11.1 there are many different criteria for judging 
the truth of a given hypothesis. For example, if we wish to test whether 
or not a given sample could have been drawn from NID(0,1), we could 
use any one of the following tests (and probably many more): 

(i) T = X, a, normal deviate with unit variance. 
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(ii) t = \/'n X/s, Student/s tost. 

(iii) ~ the x^ test for the agreement of sample and 

population variance. 

(iv) Tests for skewness or kurtosis in the distribution (evidence of 
nonnormality). 

(v) Tests for serial correlation in the observations. 

The likelihood-ratio criterion, mentioned earlier, may be used to indicate 
which of the possible test criteria to use. Actually the likelihood ratio 
defines a region of rejection R, whi(;h involves computing some test 
criterion such as one of those mentioned above. 

The observations in the sample (Xi,A^ 2 , • . . may be thought of 
as representing the coordinates of a point, in ??-dimensional space. 
The space is divided into two regions—the region of rejection R and the 
region of nonrejection. If 6' falls in R, we shall reject Hq] otherwise accept 
it. The region R corresponds to the region outside the confidence interval 
discussed in the previous chapter and is defined so that the probability of 
rejecting a true hypothesis (the probabilit.y of S falling in R when Hn is 
true) is the significance level a (for example, a = .05 or ,01). This will 
be indicated symbolically as 

(P(.S E R\IIo) = OL, 

where S ^ R means that the sample point S is contained in R. 

As with confidence intervals, there may be a large number of possible 
regions R which satisfy this probability statement. For purposes of 
making tests of signihcance, it seems reasonable to select that R for 
which (P (>S G ^) is maximized if the true hypothesis is not Hq. That is, 
we want to reject the null hypothesis as often as possible when it is 
not true. PTence, we are led to consider the possible alternatives to Hq. 
Designate all alternatives as Ha. Symbolically we wish to maximize 
(P(>S G R\Ha) for a fixed (P{S G R\Ho) = a. In future discussions we shall 
let (P(S' G R\Ha) = (Sa. P'he ({uantity (3a is called the power of the test, 
since it measures how powerful the test is in indicating a true difference 
from Ho when such a difference actually exists. In most cases there will 
be an infinity of possible alternatives Ha, and (3a will be different for each 
Ha. Hence, in general, it will not be possible to maximize the power for 
all alternative hypotheses. However, in certain cases, such a test can 
be found and is called a uniforynly most powerful test. It can be shown 
that the likelihood ratio criterion will produce a uniformly most powerful 
test if one exists. 

An unbiased test is one for which the power is a minimum for H = Ho', 
that is, we reject Ho the least number of times when Ho is the true hypo¬ 
thesis. In case there is no over-all uniformly most powerful test, it would 
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appear that we should at least choose an unbiased test and, if possible, 
select the uniformly most powerful test from among the unbiased ones. 

The theory of tests of hypotheses under discussion was introduced by 
Neyman and Pearson.^ Tests of significance are considered from the 

point of view of errors of the first and second kind. In making a test of 
J/o, two types of errors may be committed: (I) we may reject //o when it is 
true; (IT) we may accept //o when it is false. It follows that 

(P(I) = a, (P(ll) = I “ ^a. 

Maximizing the power corresponds to minimizing the probability of com¬ 
mitting a Type II error for a fixed probability of committing a lype I 
error. 

Idle importance of taking into account the Type IT error or the power 
of a test may be illustrated as follows: It is desired to determine whether 
or not a neAV variety of corn yields more than some acx^epted variety. 
Suppose account is taken of a Type I error only and that the significance 
probability is fixed at a - .05, which guarantees that, if there is no real 
difference between the new variety and the standard, significance shall be 
indicated only 5 per cent of the time. In this case it would not be neces¬ 
sary to perform an experiment at all. It would suffice to draw a bead at 
random from a bowl whose composition is 19 white and 1 red. If it is 
white, we accept Hq (no difference between the varieties); if red, we reject 
Hq. Using the procedure, we shall always reject Ho 5 per cent of the 
time, and this will be done whether J/o is true or not. The defect in this 
procedure is now apparent. If there is a real difference, we shall recog¬ 
nize this only 5 per cent of the time, which also, in this case, is the power 
of the test. It is now clear that we need a different testing procedure in 
order that the probability of detecting true differences when they exist 
may be large. The t test may be shown to maximize the power of the 
test under certain conditions. This will be discussed later. 

Cramer'^ has summarized the problem of selecting a ‘‘good’’ test 
criterion as follows: ‘‘In order that a test of the hypothesis J/o should be 
judged to be good, we should accordingly require that the test has a 
small probability of rejecting J/o when this hypothesis is true, but a large 
probability of rejecting Ho when it is false. Of two tests corresponding 
to a probability a of rejecting Ho when it is true, we should thus prefer 
the one that gives the largest probability of rejecting Ho when it is false.” 

11.3. Types of Hypotheses. A hypothesis, Ho, which specifies the 
values of all parameters in the population distribution is called a simple 
hypothesis] in other words, a simple hypothesis specifies that the distribu¬ 
tion is one specific member of a family of distributions. If Ho does not 
specify the values of all population parameters, it is called a composite 
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hypothesis. The hypothesis Ho must be taken from a set of admissible 
hypotheses, 12 , which usually depend upon the form of the distribution. 
For the normal parent distribution, 12 is —oo<ju<oo^ 

0 < 0 -^ < CO. A composite hypothesis then states that a distribution 
belongs to some subspace of the parameter space. 

11.4. Simple Hypothesis for a One-parameter Distribution. In order 
to illustrate the method of determining the power curve for a test, con¬ 
sider the problem of testing some hypothesis concerning the value of the 
mean, g, of a normal population with unit variance, Nifx,!). The sample 
observations may be n differences obtained from n pairs of observations 
such as in the corn-variety problem mentioned eai’lier. In this case the 
population of differences would be assumed to have unit variance. Set 
up the null hypothesis //o', g = go. The set of admissible hypotheses is 
— 00 <g< 00 ^ 0 '^=!. The sample mean of the differences, X, is to be 
used to test Hq against some alternative Ha. 

From the discussion on confidence intervals, we know that the prob¬ 
ability of committing a Type I error is given by 

where 72 = Mo + T 2 /\/n and 71 = mo + Ti/\/n. It will be shown later 
that, for this particular problem, X is the ‘‘best” test criterion. If 
a = .05, the shortest confidence interval was shown to result when 
^2 = -Ti = 1.96. 

The integration is simplified by setting To = y/n (X — go) so that 

a = 1- ]= rfj. 

V27r 

Hence, if the calculated X is greater than 72 or less than 71 or, what 
amounts to the same thing, if the value of To for a particular problem is 
greater than T 2 or less than Ti, then we say that we have evidence that 
Ho is false. 

In order to evaluate the power of this test to detect real differences, 
that is, g = g„ go, we obtain the power function. The power function 
will give the probability of rejecting Ho when g = ga go. The func¬ 
tion should increase as the difference between ga and go increases, since in 
such cases it would be increasingly desirable to reject Ho. The problem 
then is to determine the probability 13a of obtaining X > T 2 or X < Ti 
from X(go,l) for fixed a, Note that the power, I3aj equals a for ga == gq. 
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I n fy^ 

" Jyi 

__ 1 _ rT‘. 

~~ T/^ jTi 


g-n{X-Ma)V2 

^ p-lTo-Vn (Ma-Mo)] V2 ^T ( 


VStt 

= c^'" e-^-’-'^dTa + 
V J- ^ 


1 f"e- 

jraa 


-T.V2 


where r„ = (1 7 M.) = To - (m<. - Mo), 

T 2 „ = T 2 - V« (Ma - Mo), 

the Wue of the sum of the two integrals giving may be ^.tamcd from 
Table Ih in the Appendix for given (m« - Mo) as set out m Tab c . . 




Tablk 

11.1 




(i) 



(ii) 


5(jUa — m) 

Tia 

|3a 

T,a 

Tia 

/3o 

-3.0 
— 2.0 

4.645 

.0000 

4.960 

1.040 

.8508 

3.645 

.0001 

3.960 

.040 

.5160 

-1.0 

.0 

1.0 

2.0 

3.0 

4.0 

2.645 

.0041 

2.960 

- .960 

. 1700 

1.645 

.645 

.0500 

1.960 

-1.960 

. 0500 

.2594 

.960 

-2.960 

.1700 

- .355 

.6387 

- .040 

-3.960 

.5160 

-1.355 

.9123 

-1.040 

-4.960 

. 8508 

-2.355 

.9907 

-2.040 

-5.960 

.9793 


It might be helpful if these results were illustrated graphically as 

in Fig ni. Let us consider the difference between the pmver fo 

L -uo) = d = 0 and the power for d = 1. The small shaded area 

ovMo Mo; • f j n ,viiVi « — 05 The upper shaded area 

is the rejection region for d = 0 with « - .05. upp 

is the added amount to the rejection area for d = 1. The sum o the W 

shaded areas gives the power (d,) for rejecting he f 

when a is actually Mo + .2, since 1 == 5 (m» - Mo)- Li this case /3„ - .2oJ4. 

Graphs of various power functions for the 

on the values of Ti and T„ with fixed (P(I) = «, and on the position 

minimum point are set out in Fig. 11.2. 


(i) 7^^ = oo ; T 2 = 1.645. 

(ii) Minimum at = Mo) "^2 = 1.66, Ti — 1.96. 

(iii) r2 = = -1.645. 
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(iv) Minimum at ixa < juo, for example, T 2 = 1.75, Ti = —2.33. 

(v) Minimum at ixa > Ato, for example, — 2.33, Ti = —1.75. 

From the definition of ^a, we see that the minimum point is reached when 

T 2 — Vn (fXa — Mo) = —[Ti— \/7l (fXa — Mo)] 

or 

Ti T 2 — 2 \/^ n {ij,a — Mo). 

This relation is found by differentiating with respect to Ma and setting 
the derivative equal to zero. It follows that if Ma = Mo, the minimum 
point is at Ti = —T 2 . 


f(Ta) 




It is instructive to examine these power curves from the standpoint of 
the following types of alternative hypotheses: 

(i) If Ha asserts Ma > Mo, power curve (i) is uniformly most powerful, 
since its power is greater than that of any other curve for all such //«. 
In this case we are willing to accept /fo : m = Mo even though the true 
hypothesis is m < Mo- Hence, the region of rejection is F > 1.645 for 
a = .05. 


.t- 
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(iii) Similarly power curve (iii) is uniformly most powerful for testing 
the hypothesis n ~ fjLo against the alternatives fXa < Mo. The region of 
rejection is T < —1.645 for a = .05. 

(ii) There is no uniformlj^-most-powerful test for testing ffo against the 
alternative Ha'. Ma 9^ Mo- This result is evident from a study of (i) and 
(iii). Each of these is uniformly most powerful on opposite sides of 
M = Mo but has practically no power on the other side. No other single 
test can be as powerful as (i) on the right or (iii) on the left. For 
Ha'. Ma 9 ^ Mo we must adopt some compromise rejection region. Neyman 
and Pearson have suggested using an unbiased test, which in this case 
would lead to the use of power curve (ii). For this curve Ti — — T 2 . It 
should be noted that curves (iv) and (v) are more powerful than (ii) on 
one tail but give a power less than a for some alternatives. It should be 
emphasized that the Type I error (probability of rejecting Hq when 
— /xo) is constant for all of these power curves. 

11.5. Composite Hypotheses. The theory of tests of composite 
hypotheses has not been completed. Some methods will be illustrated 
by use of the single-tailed t test. Given a sample of n from it is 

desired to test the null hypothesis, /7o • M = 0, 0 < a-“ < 00 , against the 
alternative hypothesis, //«: m > 0, 0 < o-^ < 00 . The admissible hypoth¬ 
eses are 0:—00 <^< 00 ^ 0 < < 00 . The null hypothesis is com¬ 

posite since it does not specify the value of 

From our study of confidence intervals, it seems reasonable to use 
— 00 < X < hs/^/n as the acceptance region, where 

f(l) di = a. 

The rejection region is ^ = X -\/n/s > U. (If Ha was /x < 0, we should 
use the acceptance region t > h — or rejection region t < ti. Both 
of these yield uniformly-most-powerful tests. However, if Ha was 
M 9 ^ 0, no uniformly-most-powerful test is available. In this case we 
might make use of the unbiased test with acceptance region < ^ < 
where i[ = and q;i = 0:2 = a;/ 2 .) A more rigorous treatment of this 
problem is beyond the s(;ope of this text. 

The determination of the power for a composite test is, in general, 
quite complicated owing to the nonspecificity of certain of the parameters 
by the null hypothesis. In using the t test, Hq does not specify the value 
of 0 -^, hence, we must make use of the estimate of < 7 ^, from the sample. 
We must determine the probability that t = X \/n/s > h, if the sample 
has been drawn from a population with mean m 9 ^ 0. Now, when 
t > t 2 , we reject Hq] and if u is actually greater than zero, a correct 
decision has been made. The probability of making this correct decision 
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is called the power of the test for a given At( 0) and a. It will be recalled 
that a is the probability of stating m 0 when it actually is zero. 

_ To evaluate the power of the t test, we recall the joint distribution of 
X and from Chap. 7. 


where 


and 


/(X,s2) =/,(X) 



g-(n-l)sV2cr2^ 


Now, the pwer of the t test to detect a true mean, m = u-a, is given by 
(P{t = X ■\/n/s > 4|m = Mo) 5’(I) = a] = where 


(P(I) = 9{t > u\^ = 0) = a. 

It will be recalled that the i distribution was obtained from that of X and 
in Chap. 7. Since 9{t > h) = <s>{X > st^/Vn), we may find the 
power of the test from 




Ms^) dis^) 


h/x/r, 


1 ^ 1 _ tv 

X27r ' cr^' " ^ 


-Ma)V2o2 


where it is understood that we first determine I 2 so that (P(I) = a and 
then /3a. 

Let Ma Vn/o- = ta ( known value) and X -\/n/o- = 4 + y; then 

= i; d(s^) N(o,i)d^. 

But, s^/cr^ = x^/(n — 1), and hence 


A = C /(X=) d(x^) iV(0,I) dy. 

The evaluation of /3a must be accomplished by some form of numerical 
integration over the region y > {xh/Vn - 1) - 4 and x' > 0. If we 
let 


then 


f{u) du, 
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Since p ~ 0 when ~ 0 and p = 1 when ^ , we may change to a 
(P,y) system where 0 < p < 1. We may now compute the power of the 
test to detect a given value of ta 9 ^ 0 for a fixed t 2 and n. In order to 
compute the power for t — ta, we proceed as follows: 

(i) Set down successive values of p. 

(ii) Ascertain the values of x corresponding to each p. 

(iii) Compute the value of ?/2 for each x« 

(iv) Determine the area, P, under the normal curve between and oo. 
If we plot the values of P as ordinates with the corresponding p values 

as the abscissas, then /3a is given by the area under this curve as illustrated 
in Fig. 11.3. This area may be computed by some method of numerical 
integration, such as the trapezoidal rule or Simpson’s rule. 


P 



Neyman and Tokarska^ have published values of 4 for /3at = -99, .95, 
.90(.10).10. Using the procedure outlined above, let us calculate /3a for 
ta = 1.15, which is the value of t given in the Neyman and Tokarska 
tables, corresponding to a = .05, /3a = .20, and nf = 3. If a = .05, 
then 4 = 2.920 and = 2.065x — 1.15. We now obtain the entries in 
Table 11.2. 

Using the trapezoidal rule, we obtain /3a = .2017 as compared with the 
actual value of .20 mentioned earlier. 

11.6. Use of Power-function Tables in Planning Experiments. In 
case the experimenter has in advance some knowledge of the size of the 
coefficient of variation, that is, the standard deviation of any observation 
expressed as a per cent of the general mean, it will then be possible to 
make use of the tables of Neyman and Tokarska in the planning of 
experiments. 

Example 11.1. (Due to Neyman and Tokarska.) A plant breeder 
wishes to compare a new variety, V i, with an established standard V o. 

t /So may be found from the tables of Neyman and Tokarska by the relationship, 
/3o = 1 — Pii. The n of the tables is the degrees of freedom. 
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Let jui and juo denote the true mean yields of Vi and Vo per some unit of 
area, respectively. The hypothesis to be tested is //q: mi :< and the 
alternative hypothesis is Ha’, ijli > mo* In other words, the plant breeder 
will consider his problem of producing a better variety as successfully 
accomplished whenever he obtains evidence that Hq is not true and there¬ 
fore that gi > jio. It is desired to reduce the probability of an unjust 
rejection of to « = .01. In a completely randomized experiment each 
variety is repeated n' = 8 times, and hence the pooled experimental-error 


Table 11.2 


V 

X 

?/2 

P 

.90 

2.146 

3.282 

.00052 

.80 

1.794 

2.555 

.00531 

.70 

1.552 

2.054 

.01996 

.60 

1.354 

1.645 

.04994 

.50 

1.177 

1.281 

.10004 

.40 

1.011 

. 9373 

.17430 

.30 

.8446 

.5941 

.27622 

. 25 

.7585 

.4163 

.33860 

.20 

.6680 

. 2295 

.40924 

.15 

.5701 

.0273 

.48911 

.10 

. 4590 

- .2021 

.58008 

.075 

. 3949 

- .3346 

.63104 

.05 

. 3203 

- .4886 

.68744 

.025 

.2250 

- .6853 

.75342 

.020 

.2010 

- .7349 

.76880 

.015 

. 17.39 

- .7910 

.78553 

.010 

.1418 

- .8572 

.80133 

.005 

.1001 

- .9433 

.82724 

.0025 

.07075 

-1.0039 

.84229 

.00 

.00 

-1.1500 

.87493 


degrees of freedom is 14. According to previous experience the standard 
deviation of any single yield is expected to be o-q = 0 per cent of the 
general mean yield. 4"he experimenter now wishes to know the size of 
differences betAveen the mean yields of varieties Vo and Fi (in favor of 
the new variety V i) which he is likely to detect in his experiment in case 
they in fact exist. 

Now, in order to use Table II from Neyman and Tokarska, Ave find the 
standard deviation of the difference of the tAvo means as 

a — (To \^2/7 i' = 6 Vf = 3 per cent of the general mean. 

But A = pa = 3p per cent of the general mean, Avhere A ^ — po and 

p = (pi — po)/(T by definition. Then, entering Table II opposite = 14 
degrees of freedom, Ave multiply the tabled values of p by 3 to obtain the 
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entries in Table 11.3. The fiv.st pair of entries means that if the true 
difference in mean yield of Tx over 7o is as large as 15.54 per cent of the 
general mean yield, then the experiment described will detect this differ 
Lee in 99 per cent of the cases. From the table it may be seen that a 
reasonable probability of detection such as .90 or .80 corresponds to true 
differences in yields exceeding 10 per cent of the general mean and that 
differences under 5 per cent have a probability of only .20 of being 
detected The experimenter now possesses information enabling him to 
judge the adequacy of the proposed experiment. The experiment would 
be judged satisfactory if it is desired to discover differences over 10 per 
cent. On the other hand, if the process of improving the particular 
varieties is well advanced, a difference as large as 5 per cent may be as 
large as could be expected. In the latter case the proposed experiment 
is not satisfactory, and some modification is in order. Increased pie- 
cision may be obtained by (i) increasing the number of repetitions and 
thereby increasing the degrees of freedom, (ii) improving the experimental 
techniques and thereby decreasing the standard deviation of any sing e 
plot yield, or (hi) increasing the size of a. 

Table 11.3 

Level of Significance a = 0.01 


Size of real differences in 
average yields in percent¬ 
ages of the acreage yield 

15.54 13.26 

12.03 10. 

53 9.48 6.87 5.97 4.92 

3.45 

Probability of detection 

.99 .95 

.90 

.80 .70 .40 .30 .20 

.10 


Example 11.2. Tang® has obtained the functional form of the power 
function of the analysis-of-variance tests, and tables with illustrations 
of their uses. While the derivations are beyond the scope of this text, 
it is instructive to consider one of Tang's examples illustrating the use of 
his tables in planning experiments. A randomized-Wocks experiment is 
planned to compare four treatments (fc = 4), replicated Ay (« - 5) 
times. I>et be the difference between the true ith-treatment effect and 

the true general mean, so that | S. = 0. Suppose that for the experi¬ 
ment the have values -5, -L 3, 6 expressed as P®--y;L"T!hliue 
mean yield per plot. Further, suppose from past experience that the true 

standard deviation per plot, <r, is 10 per cent of the general mean. In 
order to enter Tang’s tables we calculate 

10 






25 + 16 + 9 + 36 j ^ ^ 




Entering Table II from Tang, with degrees of freedom /i = 3, /2 = 12 , 
and (j> — 1.04, we find Pn = .7 roughly. This means that true treatment 
differences, such as those given above, would be significant at the 5 per 
cent level in about 3 experiments out of 10 only. 

In practice the true treatment differences are not known, but use may 
be made of the fact that if 0 were as large as some specified value 0 o, say, 
the probability Pn of failing to detect the existence of treatment differ¬ 
ences may be obtained from Tang’s tables. 

In a second example Tang considers a randomized-blocks experiment 
with fc = 6 treatments and n — 7 blocks. Then /i = 5, /2 = 30, and 
Table II, appropriate when using the 5 per cent significance level, shows 
Pn = .262 for 0 = 1.5. In this case we would fail to detect the presence 
of treatment differences in about one in four times when 0 is as large as 
1.5 or when 

= 1 - 5 = 0 . 567 .. 


Assuming the standard deviation of a plot to be about 10 per cent of the 

mean yield, then ^ 

Now, there will be an unlimited number of sets of six values of 5^ whose 
sum will be zero and having 5.67 as standard deviation. In order to 
obtain upper and lower bounds for at least one value of the 8 i, we con¬ 
sider the two extreme sets 

(d) = 82 = Ss = Si = 85 = —^ 6 / 5 , 

(b) 81 = 82 = 8 z = — 8 i= — 65 = — ^ 6 . 

For (a) we find 


5.67 per cent of the mean yield per plot. 



and for (b) 



i 


It may be proved, for this example, that there must be at least one 6 , say 
8 q, whose value lies between 12.68 and 5.67. 

11.7, The Likelihood-ratio Criterion. In Sec. 9.2 the method of 
maximum likelihood was presented as a general method, involving routine 
mathematical procedures, for obtaining an estimator of a population 
parameter possessing many desirable properties. In an analogous man¬ 
ner the likelihood-ratio criterion will now be presented as a general method, 
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involving routine mathematical procedures, for obtaining a ^^good^^ test 
criterion. 

The procedure for obtaining a likelihood-ratio criterion to be used in 
testing the hypothesis that/(A^;0i, 02, . • • ,0*) belongs to the subspace co 
of the entire parameter space Q on the basis of the random sample {Xjj 
drawn from the population of (X; 0 i, 02 , . . . ,0&) is set out below. 

Let the likelihood of the sample be 

n 

Ps — /(X^;0i,02, . . . ,dh)‘ 

t = i 

This likelihood will usually have a maximum as the parameters vary over 
the entire parameter space 12. Denote this maximum value b}^ 

^s(01,02, ■■ . • ,0&) 

or briefly as Ps(^). Similarly, will usually have a maximum value in a; 
which shall be denoted as Ps(w). Then the likelihood-ratio criterion for 
the hypothesis to be tested is 

X = ^ 

p.(r2) 

The estimators 0^ of the population parameters 0^, which are obtained as 
quantities to be substituted in P^ determining and Ps(^), are 

derived by the method of maximum likelihood. It follows that X is a 
function of the sample observations only, that is, it does not involve any 
population parameters. 

Since Ps is positive as a result of being the product of density functions 
and Ps{^) is less than or at most equal to Psii^) because we are more 
restricted in maximizing Ps in co than in 12, X will be a positive fraction. 
Its range will be from 0 to 1. 

In order to use X as a test criterion in applied statistics, it is necessary 
that we obtain the sampling distribution of X on the assumption that the 
hypothesis being tested is true. We note that X will be small, if is 
smaller than Ps(12). We shall wish to reject the hypothesis to be tested 
in case X is small. 

We now find a X« such that (P(X < Xa) = a on the assumption that the 
hypothesis to be tested is true. If the calculated value Xo is less than or 
equal to Xa, that is, if Xo < X*, we reject the hypothesis; otherwise we 
accept it. It should be noted that any monotonic function of X may be 
used in place of X as the test criterion. 

As indicated earlier, a ''good’’ test criterion is one which determines a 
region which maximizes the power of detecting true deviations from the 





126 


BASIC STATISTICAL THEORY 


null hypothesis for a given probability of committing a Type I error. In 
general it will not be possible to find a region of rejection which will 
maximize the power for all alternatives to the null hypothesis. However 
for the simple case of only one alternative, Hi, and when both IIo and Hi 
are simple hypotheses, it can be shown that the likelihood-ratio test 
defines a best critical region. In this caise the whole parameter space 12 
contains only two points. If the alternative hypotheses //„ specify the 
entire parameter space 12 other than co to be a range of values on a line, 
then it is possible to choose a best critical region for each Ha. If this 
region is the same for each Ha, then the test is said to be uniformly most 
powerful. 

Example 11.3. Given a random sample of n from N{pl,\). The null 
hypothesis to be tested is //q: m = go, which states that co is a point while 
12 is the whole g axis. The likelihood of the sample is 


(v^) 




or 




(X-.Y)2-(n/2)U-M)2 


Since the ML estimator for g is X, we find the maximum value of Ps in 12 
to be 


/ 1 




Also, 


p.m - (i)' 

The likelihood ratio becomes 


(X~A)2-(n/2) (X~Mo)^ 


X == ^-(n/2)(X~Mo)\ 

If X is close to go in value, then the sample is reasonably consistent with 
the null hypothesis Hq and X will be close to 1 in value. Conversely, the 
sample Avill not be reasonably consistent with Ho, and X will ordinarily be 
close to zero. 

Now, suppose for the above example, or in general, the distribution of 
X when Ho is true is ^(X) and (P(I) = a; then X^ is determined so that 

fi‘‘g(X) dX = «. 

If the calculated X, say Xo, be less than X«, we would reject Ho, and vice 


versa. 
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From the discussion in the above paragraph it follows that the likeh- 
hood-ratio method as described may not always lead to a unique test. 
If II,) is a simple hypothesis, a umciuc distribution o ^ ° ^ 1 i p 

On the other hand, if H„ is a composite hypothesis, it will not in ge^neial 
possible to obtain a unique distribution for X and hence no unique tesh 
^ Example 11.4. Given a sample of » from N • The null hypothe¬ 
sis to be tested is //(bM = 0, <7^ is unspecified. The entire parame ei 


o 

Fig. 11.4. 

space is the lialf plane of Fig. 11.4. The subspace specified by Ho is the 
vertical line m = 0. The likelihood of the sample is 


(T \/2ir 


2 / 0-2 


The values of m and which maximize P, in Q have already been found 
to be 

p. = {l/n)XX = A, 

? = (l/n)2(Z - X)^ 

Hence, ^ 


Hence 


Now, we know that if Ho is true 


p.(^) = 1 

— 2(X - 

n 

PM = 

[1 

1 

A _ 

r ux -: 

A — 

L S.X2 




2(X - X)' 



Then, since = 2(X - X)^ + nX,‘ 


:s(X - X)^" 

s(x - xy + nX^ 


i + ty{n - 1 )' 


2(x - xy 


In this case, then, the likelihood-ratio test becomes the t test since t is a 
monotonic function of X. Then 

(P(X < X„) = (P(|/,| > to) = O'. 

The region for rejection for t is |^| > and we see that large values of t 
correspond to small values of X. 

EXERCISES 

11.1. Construct the power curve for Hal /Xa < Mo- Consider values of 
5(Ma — Mo) = ±4, ±3, ±2, ±1, 0, given the conditions of the example of 
Sec. 11.4. 

11.2. Discuss rejection regions for testing the hypothesis that the differ¬ 

ence between the population means of two variates, each, respectively, 
from X(Mi,o-^), i = 1, 2, is zero. Take ^ < Mu M 2 < ^ 0 < < oo, 

and //o^ Ml = M 2 - Hint: Consider the example of Sec. 11.5. 

11.3. Set up the admissible hypotheses 0 and the region of rejection, 
R, for testing the hypothesis that the population variance of X(m,o’“) is 
o-J, based on a random sample of size n. What can be said regarding the 
power of the test for various alternatives? 

11.4. Determine 12 and the region of rejection, R, for testing the hypoth¬ 
esis that the variances of two normal populations are equal {<y\ = a^) 
against the alternative hypothesis that erf = aal {a > 1). Since the test 


where 


criterion is F, show that Fa — Fa/a and that where 

^1 4- F~/ rii = (df)i, 712 — (d/) 2 . Complete the following power 


table for rii — 2, rit = 10; 


a 
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11.5. Derive the X criterion for testing the null hypothesis Hoim - mo, 
given a random sample of size n from the Poisson distribution 

/(X) = 

0 < X^ < oo j m > 0. The parameter space 12 is the half line m > 0, 

11.6. Given two random samples of sizes ni and no from each of two 

populations A^(0,crf) and Af(0,o-|). Derive the X criterion for testing the 
null hypothesis Ih'.al = = a- (unspecified). The entire parameter 

space 12 is the quarter plane determined by > 0, > 0. The subspace 

CO is the line o-f = (unspecified). Hint: First determine the joint 

distribution of s? and si, the sample estimates of al and o-|, respectively. 
Show that the criterion reduces to Snedecor’s F. 

11.7. Repeat Exercise 11.6 when the samples are from the populations 
N{fjii,al) and iV(jU 2 ,cr|). 

11.8. Given a random sample of size from (/It, 0 -“). Show that the X 
criterion for testing the null hypothesis Ho:a^ = al, fi unspecified, reduces 
to The entire parameter space is determined so that both /x and 
are unspecified. 

11.9. Repeat Example 11.4 when the null hypothesis to be tested is 
Hq'. fji — no- 

11.10. Repeat Example 11.3 when the random sample n is from 
A^(m,<t^) and 0-2 is unspecified by Ho. 

11.11. Given two random samples of sizes ni and th from the normal 

populations .¥(^ 1 , 0 -^) and Find the X criterion for testing the 

null hypothesis Ho'. Mi = M 2 = M? unspecified. The entire parameter 
space is three-dimensional, with coordinates (mi,M 2 ,o'^)- The subspace 03 
is two-dimensional, with coordinates (m,o-“)- 
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CHAPTER 12 

SPECIAL USES OF CHI-SQUARE 


12.1. Introduction. Chi-square has found many uses in applied 
statistics in the testing of hypotheses. Some of these uses are exact, 
that is, the proposed test criterion follows the chi-square distribution 
exactly. 

In Chap. 7 it was shown that (n — l)sVo■^ where 
= S(X - xy/in - 1) 

is evaluated for a random sample {X,} of n from jg distributed as 

with {n - 1) degrees of freedom. Methods were described and exam¬ 
ples given in Chaps. 10 and 11 for using the chi-square distribution to set 
confidence limits for and to test a hypothesis concerning specified values 
of 

It is proposed to discuss in this chapter the theory liehind other of the 
important exact and approximate uses of chi-sipiare in making tests of 
hypotheses in applied statistics. 

12.2. Goodness of Fit. Suppose that we have k classes with Xi, 
X 2 , . . . , Xfc observations in the respective classes. The expected 
values in the corresponding classes are specified a priori to be mi, m 2 , 

k k 

. . . , mj, and ^ X,- == ^ nii = n. Then pi = mi/n is the probability 
i = 1 i = 1 

of an observation falling in the fth class. We wish to test the hypothesis 
that the sample distribution in the classes might have come from a popu¬ 
lation with the particular set of ?n’s. 

In Example 5.14 it was shown that if an event could take any one of k 
values yi, y 2 , . . > ,yk with respective probabilities 

h 

Ph Pi, ■ ■ ■, pt< (2 = ’)’ 


the probability that out of n independent trials yi would appear Xi times, 

k 

2/2 would appear X 2 times, . , . ,yk would appear X* times ^ ^ X,- = 


131 



132 


BASIC STATISTICAL THEORY 


/{Xa = VI 

n X, 


Now, this may be written as 


(npi)^^e- 

Xi\ 


ni"5 




( X. ! ) 

i=l _ 

p~ "l=i%7 


If we identify this last expression with 

-fWIS) - 

we see that f{Xi} is the probability of obtaining a set of k independent 
Poisson variates, subject to the condition or restriction that the sum of 
the k Poisson variates is equal to n. Hence, our problem is to find such a 
distribution. 

First, we shall show that the Poisson distribution with large m 
approaches a normal distribution. Let 

f(X;m) = — 

Then 

log /(X ;m) = — m + X log m — log X!. 


log X! = log V27r + (X + ^) log X - X + AX + . 
by Stirling’s approximation. Let X = w + then 

log /(X;m) = ~m m log w + ^ log m — log a/^ 

- fw + ? + iVog « - fwi + I + 


2/ m 2m} Zm} 


+ (m + ?) - A = - VStto - 

where Oimr^) means that the remaining terms are of order or smaller. 
Hence, 


/(X;m) = e' 

V 27rm 


+ 0 (m-V 2 )] 


or, for large m, /(X;m) is distributed approximately as N{m,m). 


We may now write the joint distribution of the X’s for large m,- as 

k 

0~iXi—m{)y2mi 

\/ 2Trmi 


subject to the linear restriction 

k 

^ (Xi - Mi) = 0 , 

k 

where ^ Xi = n. Consider the transformation 

X, - mi 




di = 


'\/mi 


where mi — npi. Using results of Example 5.14, it may be shown that 


(^ij E {didj^ 


n — mj 
n 


V mprij 


n 


It has been shown that the d, approach normal variates as n is increased. 


Hence, in the limit 



k 



{ Xi - m,r 

mi 


would be distributed as with k degrees of freedom, except for j^e one 
linear restriction S(Xi— md = 0 . This is equivalent to 'Ldi \/mi = 0 , 
which indicates explicitly that the di are not independent. 

By the use of completely orthogonal linear forms, it may be shown that 
in the limit, Xdf is actually distributed as x^ with (/c — 1) degrees of free¬ 
dom. To this end consider the case k = 2{n = mi + m 2 ). Let 


Zi = (di ■\/mi + d<i V m^l-sjn = 0, 

2^2 = (5idi + 52d2)j 

where bi and 1)2 are to be chosen so that = 1 and (Xz,z 2 = 0- If follows 
that 

6im2 + h2mi \/m 2 — (^2 vd?^l + hi v^m2) \/mim2 ^ ^ 

C’3122 ' /"■ ’ 

n v^^ 

2 _ bfm2 + hlmi — 2hih2 \/mim2 _ ^ 


n 
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From the first equation, we see that the value of one of the hi can be 
chosen at random; then this value is inserted in the second equation to solve 
for the other hi. Suppose we let hi — then 62 = — 

Also, d\ + d\ = z\ -{■ zl = zl. Since 20 approaches a variate which is 
iY(0,l), d\ ^ dl = zl approaches a variate with 1 degree of freedom. 
For k classes, we set up the following completely orthogonal linear forms 


di Vwi + ^2 -s/nii + • • • + dfc ^ 

-- 0 , 


I 


Mi 


Z2 = 


^8 = 


di \/mim2 — Wido 
+ m 2 ) 

d\ \/ + d2 — (mi + m2)dz 

V (nil + m. 2 )(mi + m.2 + m3) 


Zk 


di \dminik + d2 m2mk + * * * 



ft - 1 

(I '«■) 

7 : = 1 



Since (7y = — \^mimj/n, i and erf = (w — for the d’s, it follo^vs 
that the { 2 ^} are NID(0,1), and since = 0, 


k 

V 

i = l 


dj 


tv 

= !■■■ = I 

z = 1 z = 2 


But the last expression on the right is distributed as with {h — 1) 


degrees of freedom, and hence so is 



Z = 1 

A test of the hypothesis that the expected values in the several classes 
are given by the specified w,:(f = 1 , 2 , . . . ,4:) is approximated by obtain¬ 
ing the probability of getting a x^ with (/c — 1 ) degrees of freedom greater 
than the computed 


k 



z = 1 


If this probability is unusually small, say .05, we may choose to reject the 
hypothesis, 



SPECIAL USES OF CHI-SQUARE A6A 

Example 12.1. Gencti(; data provide examples of the use of 
‘‘goodness of fiG’ tests. Some possible theoretical genetic ratios are 
3 :1, 1:1, 9:7, 15:1, and 63:1. 

In a study of chlorophyll inheritance in corn, Lindstromi found 98 
green and 24 yedlow seedlings in one progeny of 122 young corn plants. 
Presumably green is dominant to yellow and is segregated in a ratio 3:1, 
so that we should expect mi - 91.5 green seedlings and m.^ = 30.5 yellow 
seedlings if this ratio is correct. To test this hypothesis, we calculate 

„ (98 - 91.5)2 , (24 - 30.5)2 __ ^ 

^0 =-91:5 + -"^0:5 

By interpolation, the probability of obtaining a chi-square value of this 
size or larger on the assumption that the genetic ratio is 3:1 is .18. 
Hence, the 3:1 ratio is not rejected as a possible fit to the data. 

Example 12.2. Federer2 fitted a normal curve to the frequency dis¬ 
tribution below of rubber content (percentage) in 378 guayulc plants 


Class center 

1.5 2.5 3.5 4.5 

5.5 6.5 7.5 8.5 

Frequency 

1 1 2 33 

139 155 42 5 


The mean and standard deviation of the 378 observations were found to 
be 6.07 and .892, respectively. Using the tables for the normal curve, 
Federer calculated the expected values for the respective classes set out 
below: 


Class 

<4.0 

4.0-5.0 

5.0-6.0 

6.0-7.0 

7.0-8.0 

>8.0 

Total 

Observed frequency 
Expected frequency 

4 

3.8 

33 

39.7 

139 

133.4 

155 

144.7 

42 

50.6 

5 

5.8 

378 

378.0 

1 ■ 


The frequencies of the first three classes in the first table have been pooled 
in the second table in order that the number in any class be not less than 5 
approximately as suggested by Fisher. ^ Using the formula developed in 
this section Federer found xl = 3.302. The proof of the rule given by 
Fisher'^ for determining the degrees of freedom to be assigned Xo for 
example is beyond the scope of this text. This rule states that the correct 
number of degrees of freedom may be found by subtracting the total num¬ 
ber of restrictions imposed on the data from the total number of classes. 
In the example this would be 6 — 3, since the sum, mean, and standard 
deviation of the sample and hypothetical curve have been equated. The 
probability of obtaining a chi-square value as large as or larger than 3.30 
with 3 degrees of freedom lies between .5 and .3. Hence, we have no 
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evidence to reject the hypothesis that the rubber contents in the 378 
guayule plants are normally distributed. 

12.3. Contingency Tables. Consider the ordinary 2X2 contingency 
table used in applied statistics; 


Expectations Observations 



hi 

bz 

h 

bi 

62 

b 

ai 

nPaPh 

npaQb 

npa 

ai 

nil 

nv2 

ni. 

a-z 

nqapb 

nqaQh 

nqa 

«2 

nzi 

W22 

nz. 

a 

nph 

Mb 

n 

a 

n.i 

n.2 

n.. 


In this case k — 4, but ^ye have more than one linear restriction. If 
there is no a priori knowledge of the values of pa and we usually use 
their maximum-likelihood estimates, which are 


npa = ni. or nqa = npb = n.i or 7iqb = n. 2 ; n.. — n. 


These relations constitute three distinct linear restrictions. 

Using the methods of Sec. 12.2, it can be shown that, if the given num¬ 
ber of restrictions r can be reduced to r orthogonal restrictions, then 


k 



is distributed as with (/c — r) degrees of freedom. To this end we set 
up the following: 

Zx = dxi \/paPb + di2 \/paqh + C^21 \/qaPb + <^22 = 0? 


^2 


qa dll s/paPb + qa di2 -Vpagh - Pa d^l VqaPh " Pa d22 Vqaqb ^ Q 

Vpaqa 


Qb dll \/paPb — Pb di2 \^Paqh + qh d2x \/qaPb — Pb <^22 V_ n 

= -------:3r=--- -- U; 

VVbqb 

where 


dll 


njxxj-^b^ etc, 
VnpaPb 


By the methods similar to those of Sec. 12.2 it can be shown that dn 
approaches a A^(0,1) variate for large m. 



Since 


E{diiZi) - -s/VaVh (1 “ VaVh) — VPaqbPaVbPaqb ~ \/qaPbPaVhqaPh 

— \^qaqhPaPbqaqb 

— V "PaPb [1 ~ (PaPb + Paqb + qaPb + = 0, 

and similarly for the other d’s and then E{ziZj) — 0 for i 9 ^ j. 

A fourth orthogonal linear form is found to be 

Za = dll Vqaqb - di 2 VqaPb - d 21 ■Vpaqb + d 22 'VpaPb 9^ 0. 


It can be shown that Za is also A"(0,1) and is independent of the other three. 
It follows that zl = with 1 degree of freedom. It should be noted that 
the number of degrees of freedom is reduced by one for every parameter 
estimated from the data. If r distinct parameters are estimated, the 
number of degrees of freedom for is (/c — r — 1) (1 degree of freedom 

k k 


was also lost in making 




i = 1 i = 1 

To make the test of the independence of the two classifications in the 
two-way table, we calculate 




(nil - npaPb)- 
npaPb 


+ 


(n2 2 - nq aqb)' 
nqaqb 


and compare this calculated x^ with the tabular x“ of 1 degree of freedom 
at the 5 per cent or 1 per cent points. If the calculated x^ is greater than 
the tabular x" at either per cent point, we say that we have evidence 
from this sample that the two classifications are not independent. In 
other words, there would be evidence of an interaction between the two 
classifications. 

More precisel}^ we are evaluating the probability that an observed set of 
frequencies Xi, X 2 , . . . , Xk could have resulted from a multinomial 
distribution with probabilities pi, p 2 , . ■ • , Pk- The requirements that 
must be met in order that the x^ approximation may be used to evaluate 
this probability are (i) the frequencies follow the multinomial distribution, 
(ii) the expectations are large enough so that the normal approximation is 
satisfactory, and (iii) any estimation of the p’s should be efficient. For 
discussions related to this problem consult Cochran. 

The second paper by Cochran referred to above discusses a “correction 
for continuity” which Yates*’’ had proposed earlier and which was sub¬ 
sequently mentioned in texts by Fisher,^ Snedecor,^ and others. For the 
2X2 contingency table, Yates suggested that .5 be subtracted from the 
absolute value of each deviation in computing x“ iii order to correct for a 








slight bias in determining the true probability levels. This slight bias 
arises from the fact that the distribution is continuous, whereas the 
frequencies are discrete. The correction for continuity is not very impor¬ 
tant for x“ with more than a single degree of freedom and is never to be 
used when adding x^ values. 

Example 12.3. The F 2 progeny of a barley cross, Robertson'^ segre¬ 
gated in the following manner: 



F 


Total 

V 

1,178 

27:1 

1,451 

V 

291 

156 

447 

Total 

1 ,469 

”429 

1,898 


We calculate 

, ^ (54.97)^ (-54.97)^ ( -54.97)-- (54,97 )^ _ 

^ 1,123.03 327.97 345.97 101.03 

The probability of obtaining a chi-square this large or larger on the 
assumption of independence of the classification is extremely small. 
Hence, there is considerable evidence in favor of association of these two 
attributes. 

12.4. Homogeneity of a Binomial Series. Suppose that our data 
consist of Xi, Xo, . . . , X^ successes out of each of n independent trials. 
In tabular form we have 


Total 


Successes 


X 2 


kX 

Failures 

n — Xi 

n — Xi 

n - X. 

k{n - X) 

Trials 

n 

n 

. . n 

1 nk 


We wish to test the hypothesis that the probability at every trial is p 
(constant). The probability of obtaining the particular sample values on 
the assumption that the hypothesis is true is 

k 

i = I 

The maximum-likelihood estimate of p is X /n. Then the expected values 
for each of the success cells is np = X and for each of the failure cells is 
nq — {n — X). Now, consider the expression 







* j « y (X.- - x)^ 

Y (Xi - X)-’ Y (« - Xi - n + X)-’ _ .Yi _ _ 

4 ' X 4 ■ ■ n- X ' X(W - X) ' 

i = 1 i = 1 

We may write this in the form 



Now, as n—> 00 , ^ (X^ — XY/nyq approaches the tabular w 

■i = 1 


ith 


{k ~ 1) degrees of freedom and 


npq 


n 


(‘ - 1 ) 


approaches one with a negligible error. Hence an approximate test of 
homogeneity of a binomial series may be made by using 


k 

V 

t = 1 


(N. - X)^ 


X(n - X) 


with {k — 1) degrees of freedom. 

Example 12.4. Ten samples of 25 stalks of corn each examined for 
European corn-borer infestation gave the following counts: 11, 7, 3, 8, 15, 
2, 10, 21, 18, 9. Is the infestation random? 

We calculate 


2 == (25 ) ( 330.4) 

^ (10.4) (25 - lO.i) 


55,39, 


with 9 degrees of freedom. Upon consulting the tables we find this 
value to be highly significant, and hence we have evidence that the 
infestation is not random. 

12.6. Homogeneity of a Poisson Series. The data consist of Xi, 
X 2 , . . . , Xfc. Wc wish to test the hypothesis that each of the X^s 
comes from a Poisson distribution with the same m. The probability of 
obtaining the particular sample values on the assumption that the 
hypothesis is true is 







It can be shown that the maximum likelihood estimator for m is X. Con¬ 
sider the expression 

It “^1 111- 

i= 1 
k 

Now, as n oo, ^ {X^ — XY/m approaches the tabular with (k — 1) 
degrees of freedom and X/m approaches 1. Hence we may use 



i — 1 


with (k — 1) degrees of freedom as an approximate test of homogeneity 
of a Poisson series. 

Example 12.5. The number of wireworms in the check plots of a 
Latin square were 2, 6, 4, 9, 8. Is the infestation raiidom? 

Assuming that the counts are distributed in a Poisson fashion, we 
calculate = 5.66 with 4 degrees of freedom. Upon consulting the 
tables we find the probability of obtaining such a chi-square value on the 
assumption of homogeneity to be large; hence we have no evidence that 
the null hypothesis is false. 

Further discussion relative to the material in Secs. 12.4 and 12.5 may 
be found in Fisher^ and Cochran.'^ 

12.6. Combination of Probabilities. Suppose that the following 
information has been obtained from two experiments: 

Experiment 1. A — 5 = 2.15, estimated standard error is 1.28, 
degrees of freedom arc 30, t — 1.680, and the single-tailed probability is 
Pi — .0544. 

Experiment 2. Out of 10 trials A was superior to 8 times. 

+ 10(i)^(i) + 45(i)«(i)2 = .0547. 

Can we make a test of the hypothesis that A = 5 , with the alternate 
hypothesis that A > 5 , by combining the information from these two 
experiments? 

Let X be distributed as/(X) dX, — oo < X < co ; then it is possible to 
show that —2 loge follows the distribution with 2 degrees of freedom, 
where 

p, = /A fit) dt. 

To this end we find the distribution of pi to be 








J.*Xi 


But 


dX 


= fiX)-, 


hence the distribution of pi is dpi. Also, when X 
when X —> oo ^ —> 1. Let 

u = loge Pi] 

then the distribution of u is 


— CO, Pi —^ 0, and 


Let 



du — dUy 


— 00 < < 0 . 


^ = —2ii, dz = —2 du] 


then the distribution of ^ = —2 log^ p^• is 


dz, 0 < z < ^. 

But the distribution of with 2 degrees of freedom is 

Hence, z = -2 log« pi is distributed as with 2 degrees of freedom. 

Since the pi are independent, the sum of k such x^ values is distributed 
as x^ with 2 k degrees of freedom. 

Applying the method developed above to the two experiments, we find 

logio Pi = 2.7356 
logic P 2 = 2.738 0 

3.4736 = -2.5264 

and 

xl = -2(-2.5264)(2.3026) = 11.634, 

with 4 degrees of freedom. The probability of obtaining a x^ as large as 
or larger than this value on the assumption that the hypothesis tested is 
true is .02. 

12.7. Bartlett’s Test of Homogeneity of Variances. Suppose we have k 
independent sample variances with Ui degrees of freedom each from 
populations which are N {iJii,(Tf ). It is desired to test 

Holer? = = • • • = al 

in other words, that each Vi is an estimate of the same population variance 
0 - 2 . Bartlett^^’ has proposed the criterion Q/l, which can be shown to be 
approximately distributed as x^ with (fc — 1) degrees of freedom, as a 
test of Ho. In this expression 




OX XXX XOX 
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Q = 71 ]ogv ~ ^ n-i log 

i=i 


2 = 1 + 


3(/c'- 1) 


tc 

Vi-i 

/ ! 7ii n 


where n = ^ and ^ ^ UiVi/Ti. 


The proof that Q/l is approximately distributed as with (/c — 1) 
degrees of freedom will be outlined here. Using the distribution of vi 
(with Hi degrees of freedom), we find that the moment generating function 
and cumulant generating function for log Vi are 


m{t) = 




m) = t log + log I' ^ 


(¥ + ') 


From Stirling’s formula for approximating factorials, 
r(*) == - 1)'’-* 1 + ^2(+ 1) ] 


Hence, omitting the subscript i for convenience, 


t + —— 1 log ( 2 + 


K«) = t (log - log 

“ (“i^) + 


n — 2 


6(2« + re - 2) 


- log 1 


6(re - 2) 


- 1)’ 








and 


log (1 + n;) = a; - + • 

K(0 = tlloga^ -~l] + [t + 


+ 




Sin - 2)3 


('+*-i^) 

■]- 


2 t 




In - 2 2(n - 2)2 


? + A. + A + 

ri ^ 2n2 ^ 3n3 ^ 


+ 


1 


1 


6 i2t + n — 2) 6(n — 2) 

Upon simplifying the last bracket to 

__ 

3(n - 2)^ (l + 

we may write the cumulants as, upon replacing the subscript f, 

Ki = log (7^ - 1 + 


K2 == 2 

and 

/^3 = 6 


ni — 1 

2 2 

1 

ni — 2 

n.- ni 3{ni - 2)’ 

= log 0-2 - 

-H g 
ICO 

1 

neglecting terms of order Tty®, 

Ui ~ 1 

2 

= +ni + 1). 

ini - 2)2 

Sim ~ 2y_ 


4(n^ — 1) 


= 1 (nf + 2n, + 2 ). 


Sini - 2)3 im - 2)2 3(71^ - 2)4j 

The above results may be used to find the cumulants of Q, which can 
be shoAvn to be 

= -(-n)%(logv) + S(-n,:)Tr(log ?;,•)• 

Bartlett obtains the above results by the following arguments: Now, 

k 

Q = log 

t = 1 

It can be shown that the distribution of v/vi is independent of v; hence 

K[Q + (—n log ?;)] = K(Q) + K(—n log v)^ 
or 

K(Q) — K[Q + ( — n log ^)] — K(—77 log v) 


= K ^ i-ni) log Vi~^ - K[--?i log v]. 
■» =1 





Hence, the cumulants for Q are 



which are also the first three cumulants for the distribution with 
(/c — 1) degrees of freedom. Hence, Q/l will be approximately dis¬ 
tributed as with {k — 1) degrees of freedom. 

12.8. Test of a Second-order Interaction in a 2 X 2 X 2 Contingency 
Table. Suppose we extend the 2X2 contingency table of Sec. 12.3 to 
the 2X2X2 table below: 



Ai 


A 

12 




B2 

Bi 

B2 

Totals 

c. 

C2 

Cl 

C2 

Cl 

C2 

Cl 

C2 


Pi 

■P2 

P-i 

Pi 

P6 

P6 

Pi 

P8 

1 

nil 

m .2 

rrh 

nii 

W 6 

me 

nil 

nis 

n 

Xi 

X-i 

Xz 

Xi 

X6 

Xz 

Xi 

Xs 

n 


The three classifications are designated by A, B, and C, respectively, 
while Pi, Mi, and Xi are the probabilities, expected values, and observed 
values, respectively, corresponding to the respective Ci subclasses. 

The first-order interactions such as BC were discussed in Sec. 12,3. 
The BC interaction for A i is defined as 

VllV2 

Vz/V^ 

and for A 2 as 

Ps/P e 

P 7 /P 8 

The null hypothesis tested for a 2 X 2 contingency table is 

P1/P2 = P3/P4, 

which is equivalent to 

P1/P2 _ . 

P 3 /P 4 


that is, the interaction is unity. 

The null hypothesis for testing the existence of an ABC interaction is 
that the BC interaction is the same for both Ai and A 2 , or symbolically 
that 

Pt/P2 ^ V±/Pj^ 

P 3 /P 4 Vi/ps 

which reduces to 

P 1 P 4 P 6 P 7 = P 2 P 3 P 5 P 8 . 


Estimates of the p/s or nii = piU may be obtained by the method of 
maximum likelihood. The probability of obtaining a particular set of 
observed rr/s is given by 


Ps = 


n\ 

:r 110 : 2 ! • • 


xA 


PfPf • • • p7- 






It follows that 


Ps [] mf. 


Hence, the logarithm of the likelihood function is 


o 

^ log 


which is subject to the restrictions 
(i) 


o 

(ii) Mi = n. 

i=l 

In order to determine estimators of the rrii, we form the augmented 
function 

V " 

^ ^ + k(mimmGni7 — ^ ^ nii — ri^ 


and obtain the partial derivatives 


.^J± = £1 + ^ ^ = 0, 

dnii mi mi ’ 

—■ = — --^-2 = 0 

dm2 7n2 m2 ^ 


dU _ ^8 kfJL 

dnis nis mg 


where we have set 


X = mimmG^Uj, ju == m27nzmr,ms. 

The following equations may now be obtained: 

Xi ~ mil /cX, X2 — 7712I “b kfiy 

X4 = m^l /cX, Xs — m^l T kny 

Xe = m^l — k\, X5 = md + kjx, 

X7 — m 7 l /cX, Xs = m^l “b k^x. 

Upon adding the eight equations, we obtain 

n = In — 4 :kX + 4 A;//, 


n = In 
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since X = ju, and hence / = 1. Using this result, and setting fcX = k^x = 5, 
we obtain the estimators 


rfh = Xi d, 

Mi — Xi “b 8, 

Mfi == Xe + 6, 

77lj = Xj -jr 8, 


W2 = X2 — 8^, 

in = it's — 8^ 
vu = Xr^ — 6, 
Ms — Xh — 8. 


Multiplying both sides of the equality tested by the null hypothesis by 
n, we obtain 


mimpnew? = 


An equation for obtaining a value for 8 may be obtained by substitution, 
that is, 


-f- 6 )(x4 4“ 5) ( 3:^6 ~b 5) (^7 5) — {x2 8)(xz 8 )(x5 8){xs 8). 

From 8 and the x/s, values for the m* may be obtained. A test of the 
null hypothesis may now be performed by using the criterion 


8 



where the associated degrees of freedom are determined by subtracting the 
number of independent parameters estimated from the total number of 
classes, that is, 8 — 1 — 6, or 1, in this case. 

EXERCISES 

12.1. (Some of Bealhs data given by Neyman.’b Fit a Poisson curve 
to the following distribution of European corn borers in 120 groups of 8 
hills each. Use the method illustrated in Example 12.2 to calculate a 
goodness of fit’’ chi-square. 


No. of borers 

0 1 2 3 4 5 6 7 8 9 10 11 12 

Observed frequency 

24 16 16 18 15 9 6 5 3 4 3 0*^ 1 


12.2. Use the method of Sec. 12.8 to develop a test for interaction in a 
2X2 contingency talkie. Show that this is the same test as the test of 
independence of the two classifications developed in Sec. 12.3. 

12.3. If the entries in the cells of a 2 X 2 contingency table are desig¬ 
nated a, h, c, and d and a + ^ + d = n, it can be shown that the 




exact probability of any observed set of entries is given by 

(a4-fe)!(c + d)!(a + c)!(5-hc^)! 

n\a\h\c\d\ 

where the marginal frequencies are assumed to be fixed in repeated 
sampling. Use the formula and Stirling’s approximation for factorials 
for the data in Example 12.3. 

12.4. Calculate chi-square corrected for continuity for Example 12.3. 

12.5. A more rigorous derivation of the quantity distributed as chi- 
square to be used in testing homogeneity of a binomial series starts by 
showing that 

lim C(:n,x)p^q^-^ = - -Jl—_ g-(x-«p)V 2 np«^ 
n -*« -y/2Tnpq 

Assuming this relationship, complete the argument. 

12.6. Using the approach suggested in Exercise 12.5, discuss an alter¬ 
nate method to that used in Sec. 12.5 of obtaining the quantity distributed 
as chi-square to be used in testing homogeneity of a Poisson series. 

12.7. The method used in Sec. 12.6 to obtain the distribution of p 
incidentally proves a general theorem which states that any continuous 
distribution may be transformed into a rectangular distribution. Using 
this result, complete the argument for showing that there exists at least 
one transformation which will transform any continuous frequency dis¬ 
tribution into any other continuous frequency distribution. 

12.8. Check the expression given for m{t) for log Vi in Sec. 12.7 by 
suitable modification of the results of Exercise 7.15. 

12.9. Apply Bartlett’s test of homogeneity of variances to Federer’s^ 
data from seven nonselected strains of guayule Avhich were in the 54 ± 
chromosome group. 


si 

9.28 

6.80 

7.26 

7.43 

9.99 

14.02 

10.80 

Ui 

117 

119 

117 

115 

119 

116 

116 


12.10. Check the agreement between values of Q in Sec. 12.7 and 
Snedecor’s F for k ~ 2 and rii — 712 = 1, 2, 3, and 6. Is one of these 
exact? 

12.11. Bartlett,using published data of Hoblyn and Palmer given 
below, obtained 5 = 5.1, an uncorrected = 2.274, and a corrected 

= 1.850 in testing interaction in a 2 X 2 X 2 contingency table. The 
experiment was planned to investigate the propagation of plum rootstocks 
from root cuttings, the number of cuttings for the variety represented 
being 240 for each of the four treatments (only one treatment is considered 
here). 
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Length of cutting 

1 

Alive 

Total 

Dead 

Total 

Time of planting 

Time of planting 

At once In spring 

At once In spring 

Long 

Short 

156 84 

107 31 

240 

138 

84 156 

133 209 

240 

342 

Total 1 

263 115 

378 

217 365 

582 


I- 


Check Bartlett’s results. 
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Part II 

ANALYSIS OF EXPERIMENTAL MODELS 
BY LEAST SQUARES 




CHAPTER 13 

REGRESSION ANALYSIS 


13.1. Introduction. In Part I, we have presented some of the basic 
statistical concepts of estimation and tests of significance. We presented 
as our basic estimation procedure the method of maximum likelihood, 
because it has certain optimum properties such as giving a minimum 
variance estimator and a sufficient estimator if the latter exists. Now we 
propose to consider the problem of estimating the value of some dependent 
variate, Y, on the basis of information on one or more other fixed variates, 
Xi, X 2 . . . . A dependent variate will be understood to have a proba¬ 
bility distribution, while a fixed variate does not have a probability dis¬ 
tribution. Another way of saying this is that inferences are to be made 
regarding the variability of Y for this particular set of X^s. The F’s are 
expected to fluctuate from sample to sample, while the X’s remain fixed. 
For example, we might wish to estimate the yield of wheat, Y, for different 
amounts, X, of a standard fertilizer applied to the soil. Or, more exactly, 
we might estimate this same yield on the basis of the amount of nitrogen, 
Xi, phosphate, X 2 , and potash, X 3 , applied to the soil. There is a great 
demand for information on the effect of temperature and precipitation on 
yields. The economist tries to predict future employment and price 
relationships on the basis of past data on the same and other economic 
variables. The engineer has the problem of estimating the probable 
length of life of roads or other structures in terms of such things as proba¬ 
ble use, type of construction, and weather conditions. The doctor must 
decide on the basis of certain measurements how much of a given drug 
can be safely administered to a patient. 

As indicated in a previous chapter on bivariate distributions, the 
expected value of Y for a single fixed X gave the so-called regression curve 
of F on X. A first approximation to this curve was indicated to be a 
straight line of the form 

X(FiX) « « + /3X. 

In fact, if X and F were distributed as a bivariate normal, 

E{Y\X.) /3X. 

F is called the dependent variate and X the fixed variate. X is often 
called the independent variate. However, it should be understood that in 

153 
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between F anr¥ 'Tt" is TeO toowr°hT*'*'°" relationship 

interval of most functions by a straight te T ^ short 

small range of X, it is possible that .f«+ '* ,’+^, ’ collect data for a 

well even though the true relationship Tre cuS“ 

Let us assume that the measured value of F can be written as 

Y = E(Y\X) + 

to'by thTr^gTesl^“o??on^?'°^^^ ^ 

curve is selected so that the residuals are 0^1 4 ® 

usual added assumption that the e are NID(0 o-^) " 

tion of r fixed variates, we might write ’ ' ^ func- 

Y = a + ^,X, + is^, + ... ^ ^ ^ 

where^(F) = « + + ■ ■ ■ + 8X TP- 

that the only error involved is ,= • in i' ^™ assumes 

^'s. Hence we are clSrl on,v t no error in the 

with the mean of F being approximated distribution 

If the Z-s are not mearred w thonT the 

have probability distributions of their ownTTe ” the Z’s 

more complicated problem of multivaristp ’ T consider the 

sidering here only regression eniiHrin k- 'vv^e are con- 

coefficients, Qi and the /3's. Methodr of\ dr^^ linear in the regression 
and problems of nonlinearitv of fh^ ^ handling multivariate problems 
the scope of this book. It shLld be em ““efficients are beyond 

^’s can be handled by the introductio that nonlinearity of the 

Sion equation. For examo f Z Mtute terms in the regres- 

In order to estimate X . f' f represent Zf. 

and Z, Z? f tr between F and Z (or between F 

will be obtained on Fand Z simultaneous observations 

terms of estimates of B^UndT^ ""^.e F, in 


Yy = f,■ + 


J = 1 , 2 , 


. " ‘ ‘ , n, 

where Fy is the estimate of ^tF 'i and .0 +i x- 
is a linear relationship, " ^ t le estimate of e,-. Hence if Z(F) 

Yj = a + 0Xj + e. = a + hXj + 

f^lLTd b"y" 'wJcan remet "t fh” ®t-d 

-inLig. 13.1. ^(K)rthe 

regression line. For a given observaZn (7 F.rthe"t 

j), the true error is given 
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by €j{QP), the estimated residual by ej{RP), and the difference between f 
and E{Y) by dj{ = ej — cy = QR). 

In order to obtain the values of a and h, at least two sets of observations 
(X, F) are required. Of course, if only two sets are obtained, ? will pass 
through the two points, both values of e will be 0, and the values of a and 
h can be determined by a simultaneous solution of the two equations 
representing the two points. However, if more than two sets of observa¬ 
tions are obtained, we shall have the situation pictured in Fig. 13.1 with a 
sample residual, c, corresponding to each set (X, F). When there are more 
than two sets of points, some new method of determining a and h must be 
found. In the chapter on bivariate distributions, we indicated that a and 
h could be determined by the method of moments. There are many 



Fig. 13.1. True and estimated regression lines. 


other methods of determining these estimates of the parameters, a and /?, 
in order to obtain the ‘‘best” linear fit to the data. 

In any case, it seems reasonable to make the [e\ as small as possible. 
But what do we do to make these { e } small? Many courses are suggested, 
among which are: 

(i) Minimize the sum of the absolute values of the e, 

(ii) Minimize the greatest of the absolute residuals. 

(iii) Minimize the sum of the squares of the residuals. 

Method (iii), called the “method of least squares,” is probably the easiest 
to apply and has certain optimum properties. It has been shown for 
fixed X’s that this method produces a linear unbiased estimate of F which 
has minimum variance.Also if the errors, e, are NID, the method of 
least squares produces the same estimates as does the method of maximum 
likelihood. The method of moments will also give the same estimates for 
NID errors, provided the regression equation is linear in the parameters. 

In the derivations which follow for estimating the parameters in the 
regression equation, for example, a and /3, we need postulate only that the 
errors are noncorrelated and have the same variance. When the usual 
tests of significance (such as t and F) and confidence limits are introduced, 
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it is necessary to assume further that the errors are normally and inde¬ 
pendently distributed. Of course, non correlated normal errors arc inde¬ 
pendent. In general we shall assume NI D errors, Avith the understanding 
that the assumption of normality can be omitted if the investigator is 
interested only in point estimates and not in confidence limits or tests of 
significance. 

13.2. Regression of Y on a Single Fixed Variate. Let us first consider 
the problem of estimating the best linear relationship between Y and a 
single fixed variate, X, so that 

X == ct 4“ /3X -j- € = a 4" hX 4“ 

where a and jS are the parameters and a and 6 their respective estimates, 
e is the true error and e the residual about the estimated regression line 
(e = F — F, where ? = a 4- hX). We assume that a sample of n X’s 
are selected (without error) and the corresponding values of F’s then 
measured. If it is further assumed that the true errors (e) are independ¬ 
ently distributed with zero means and the same variance, the method 
of least squares Avill produce unbiased and minimum A^ariance estimates of 
the parameters, a and /3. 

The error sum of scpiares (SSE) is f 

SSE = = S(Y - a - hX)\ 


a and h are to be determined so as to minimize SSE. 
equation for a is 

assE . .... . 


Hence 


= 0, SY = na 4- hSX. 


~ - b = Y - bX. 

n n 


The estimating 


If we insert this value of a in SSE, we find that 


SSE = S[(Y - F) - h(X - X)]2 = S{y - hx)^, 

Avhere y = (F — F) and a; = (X ~ X). Hence avc might as well have 
written 

? = F + instead of F = a 4- bX, 


and similarly 


E(F) = g 4- I3x, 


where F is the least-squares estimate of ^ and a = y — /3X. The least- 

t In Part II the letter S will be used for summation oA^er sample values, while S 
will be reserved for a sum of fixed variates. 




The predicted value of F for a given X, say X', is 
t' = Y + h(X' - X) = F + bx'. 


If the experimenter wants to put confidence limits on Y\ he must choose 
one of the following: 

(i) The confidence limits for the average of all Y', E{Y'), which might 
occur for the given value X^ = X'. 

(ii) The confidence limits for any single predicted value, Y'. 

For (i) we need to determine the variance of the difference between an 
ordinate on the computed regression line, and the corresponding 
ordinate on the true regression line, E(Y'). This difference is 


S' = f' _ E{Y') = (f - m) + (b - 0)x', 


with variance 


1 

n Sx'^ 


But for (ii) it is necessary to estimate the variance of the difference 
between the ordinate on the computed regression line, F', and the corre¬ 
sponding true ordinate, F'. This difference is 


with variance 


e' ^ Y' - Y' = e' 


1 x' 
1 + - + 


Sx( , 


n Sx- 


where e' is not one of the e’s in the original sample of n. 

If we now assume that the e arc NID(0,a-^), SSE is distributed as xV‘^ 
with (n — 2) degrees of freedom, so that 

= SSE/(n - 2) 

is an unbiased estimate of The proof is as follows: 


SSE = ( e - € - — xj = 


{Sxey 

Sx^ 


Set \/nl — \/n {Y — }x) =20 and {SxY^/^/Sx"^ = (5 — /?) ^/Sx^ = Zi, 
Then 

SSE = - zl - zl 

But Zi) and zi are orthogonal linear functions of NID(0,o'^) variables, each 
z being itself NID (0,0-2). iJence each jg independently distributed as 
xV2 with 1 degree of freedom, and SSE is then distributed as with 
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(n ~ 2) degrees of freedom, f Therefore we can use as an estimate of 
(T^ in the above variances and obtain the following confidence limits: 

(i) T' - tas{d') < E{Y') <Y' + tas{Y), 

(ii) P - U(e') < Y' < F + U(c'), 

where s(6') and s(e') are the same as the corresponding Ys but with a 
replaced by s, and (P(2J > to) = cx/2. 

In both cases, we estimate Y' = 7 + hx', where Y and b were com¬ 
puted from the previous sample. The two sets of confidence limits reflect 



Fig. 13.2. Regression line and confidence bands. 

two different uses of this estimate: those for (i) are concerned with esti¬ 
mating the average Y for the given X, Avhile those for (ii) are concerned 
with a single Y on a given experiment. It should be reemphasized that 
we assume a distribution of 7’s for each X and that the second confidence 
interval is necessarily much wider because the variability of separate 7’s 
is also considered. It should be noted that s-(e') = s-(5'} + s^. 

These results are illustrated in Fig. 13.2, The reader will note that the 
confidence lines form a hyperbola, with the curvature being much greater 
for the average 7[7 + ^s(5)] than for a single predicted 7[7 ± ^s(c)l. 

t See Sec. 14,3 for a formal proof of this. The essence of the proof is that SSE is 
the sum of (n — 2) independent variables, each with 1 degree of freedom. 
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Finally we can use Student^s t as a test criterion to test the null hypoth¬ 
esis = i^o.' 

t = (b — do) VSxA/s, 

since the estimated variance of b will be s^/Sx^. 

We note that 

SSR = b^Sx^ = (^1 + d = (0? + 2zil3 

Hence SSR is distributed as with 1 degree of freedom when d = 0, 
and we can use the ratio 

F = SSRA^ 

to test the null hypothesis that (3 = 0. Also 

^(SSR) = > 0-2, 

indicating that the one-tailed F test is the appropriate one. A convenient 
method of displaying these results is the analysis-of-variance table: 


Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean square 

E(MS)t 

Regression 

1 

SSR 

MSR = SSR 


Error 

, n - 2 

\ 

SSE 

s2 = SSE/(a - 2) 



t E'(MS) = expected value of the mean square, where the mean square is the sum of 
squares divided by the degrees of freedom. 

One of the basic assumptions in the use of the method of least squares 
is that the errors are noncorrelated. J. Durbin and G. S. Watson have 
developed a statistic to test this assumption and have computed upper 
and lower bounds for the significance levels.^ A discussion of regression 
analysis when the variance is not assumed constant for all X^s is presented 
in Sec. 14.4. 

The results of the regression analysis cannot be applied to the entire 
{X,Y) population—only to the set of X’s used in the analysis. C. P. 
Winsor^ has prepared an excellent discussion of the problem of fitting 
regressions when errors of measurement are present in one or both sets of 
variates and when a random bivariate sample is secured. Wald^ and 
Bartlett® have presented methods of fitting a straight line when both 
variables are subject to error. 

If a random bivariate sample has been secured, methods have been 
devised to obtain estimates of and confidence limits for the value of X for 
a future Y, as well as for the value of Y for a future X. The method of 
least squares, regarding X as fixed, produces the same results as the 
bivariate solution when predicting Y for a future X but not for the inverse 
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Table 13.1 

Data for Example 13.1 f 


X 

n 

SY 

n' 

S'Y 

X 

n 

SY 

n' 

S'Y 

39 

3 

1.9 

3 

1.9 

65 

11 

17.4 

11 

17.4 

41 

3 

3.4 

3 

3.4 

66 

19 

36.3 

17 

23.9 

42 

18 

13.1 

18 

13.1 

67 

8 

15.3 

7 

10.0 

43 

1 

.7 

1 

.7 

68 

7 

14.9 

6 

10.5 

44 

11 

7.1 

11 

7.1 

69 

11 

23.8 

10 

19.8 

45 

51 

49.9 

51 

49.9 

70 

12 

22.7 

10 

14.0 

46 

3 

2.6 

3 

2.6 

71 

13 

23.4 

13 

23.4 

47 

34 

31.9 

33 

27.6 

72 

8 

17.2 

8 

17.2 

48 

86 

78.3 

86 

78.3 

73 

9 

12.2 

9 

12.2 

49 

17 

10.7 

17 

10.7 

74 

12 

43.1 

9 

20.7 

50 

51 

44.3 

51 

44.3 

75 

14 

34.0 

13 

28.5 

51 

39 

40.0 

39 

40.0 

76 

11 

22.9 

11 

22.9 

52 

31 

36.2 

31 

36.2 

77 

9 

33.0 

6 

9.0 

53 

63 

65.2 

63 

65.2 

78 

4 

7.5 

4 

7.5 

54 

45 

64.5 

45 

64.5 

79 

14 

29.6 

13 

23.8 

55 

52 

55.1 

52 

55.1 

80 

6 

11.8 

6 

11.8 

56 

39 

58.0 

39 

58.0 

81 

6 

17.5 

5 

13.2 

57 

23 

34.2 

22 

28.3 

82 

3 

13.6 

2 

5.2 

58 

30 

37.7 

30 

37.7 

83 

4 

32.0 

2 

3.8 

59 

25 

41.4 

25 

41.4 

84 

7 

24.7 

5 

12.2 

60 

17 

26.0 

16 

22.0 

85 

2 

5.5 

2 

5.5 

61 

26 

41.0 

25 

31.9 

88 

2 

9.9 

0 

0 

62 

11 

15.6 

10 

11.4 

89 

1 

16.3 

0 

0 

63 

24 

32.8 

24 

32.8 

91 

1 

10.3 

0 

0 

64 

12 

29.5 

9 

16.4 











Total 

909 

1,316.0 

876 

1,093.0 

t X = socioeconomic score; F = 

= net farm income ($1,000); 

n = number of 


farmers; n' = number of farmers with income loss than $4,000. 

problem of predicting X for a future F. Eisenhart/ Bliss, ^ and Winsor^ 
have discussed the latter problem when the method of least squares was 
used to estimate the regression line. 

Example 13.1. A study was made of the relationship between the 
net income (F) of 909 Southern farm families and a socioeconomic score 
(X), the latter based on the possession of certain items such as radio, tele¬ 
phone, automobile, and electricity and the education and church attend¬ 
ance of the heads of the families.*^ The possible range of X^’s was 39 to 91. 
The number of sample families and the total income for each are 
presented in Table 13.1, for all families and for the families with incomes 
less than $4,000. A sample of these data is presented graphically in 
Fig. 13.3. The following sums and sums of squares and cross products 
were obtained: 






J DZ 




SX = 51,852, SY = 1,316, 

SX‘^ = 3,053,808, SXY = 81,621, SY^ = 3,898, 

where Y was in terms of thousands of dollars. The sums of squares and 



Fic. 13.8. Regression of net income on socioeconomic score. 


products adjusted for the means were 

^0^2 = 3,053,808 - (51,852)2/909 = 96,019, 

Sxy = 81,621 - (51,852)(l,316)/909 = 6,552.5, 

Sy^ = 3,898 - (1,316) 2/909 - 1,993. 

Hence 

h = Sxy/Sx^~ = .06824, Y = 1.448, X = 57.04, 

SSR = bSxy= {Sxyy/Sx^ = 447, SSR = %2 _ SSR_^1,546, 

s2 = SSE/907 = 1.704, s = 1.305, s(b) = s/\/Sx^ = .004211, 

t = b/sib) = 16.2. 

The 95 per cent confidence limits for /5 are (3 = .06824 + (1.96) (.004211), 
or 

.05999 < ^ < .07649, 
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since the 95 per cent value of t is 1.96 for 907 degrees of freedom. Hence 
one could state that on the average an increase of one unit on the socio¬ 
economic score would be associated with an increase of from $59.99 to 
$76.49 in net income. The analysis-of-variance table is: 


Source of 

Degrees of 

Sum of 

Mean 

variation 

freeclora 

squares 

square 

Regression 

1 

447 

447 

Error 

907 

1546 

1 1.704 


F = 447/1.704 = 262.3 = t\ 


There is an undoubted significant relationship between net income and 
socioeconomic score, but. the percentage of variation accounted for by the 
regression is only (447)100/1,993 = 22 per cent. It was hoped that the 
relationship would be close enough so that in future surveys an adequate 
measure of income could be obtained from the socioeconomic score, since 
it is much easier to obtain socioeconomic information than income data 
(most of the socioeconomic items can be observed by the interviewer; 
hence interviewing biases can be eliminated). However, if only 22 per 
cent of the total income variability can be explained by the socioeconomic 
score, some other means of estimating net income must be devised. An 
attempt was made to obtain a possibly better fit to the data displayed in 
Fig. 13.3 by redefining the population to contain only incomes less than 
$4,000. However, when a new fit was attempted, the proportional reduc¬ 
tion in sum of squares due to regression was even less; hence this approach 
also was abandoned. Since the variation about the regression line seemed 
to increase with increasing X, it was thought that a logarithmic relation¬ 
ship might fit the data better; however, there was no real improvement in 
the percentage of variability accounted for by the regression. 

The failure of the socioeconomic score to estimate net income is shown 
in the 95 per cent confidence limits for a predicted value of F. 

(i) E{Y') = ?' ± 2.558 .J.OOlll + 

(ii) F' = f' + 2.558 ^1.00111 + 

As an example, consider the confidence limits if X' = 80. In this case 
f' = Y + hx' = 1.448 + (0.06824)(22.96) = 3.015. Hence the 95 per 
cent confidence limits are 

(i) E{Y') = 3.015 + 2.558 VdlOeGOO = 3.015 + .208, 

(ii) F' = 3.015 ± 2.558 \/l-006600 = 3.015 + 2.567. 




Since it was hoped to use this regression equation for an individual family, 
the appropriate set of confidence limits to consider is (ii): 

.448 < r < 5.582. 

Hence we could only estimate the income as falling between $448 and 
$5,582 with 95 per cent confidence. The regression line, Y, and the 95 
per cent confidence band for a single predicted Y are shown in Fig. 13.3. 
Note that the confidence lines are practically parallel to Y, because 
s"(c) = s^(5) + s^, and is the dominant term. 


EXERCISES 

13.1. Use the confidence limits for predicting E(Y') for a future X' to 
show that the confidence limits of X' for a future Y' are 




= X + 


h(Y' - Y) 
X 



(Y' - fy 


where X = and X, Y, 5, s, and Sx"^ are based on the original 

sample of n. 

13.2. What changes would be made in the confidence limits in Exercise 
13.1 if were the average of k observations? 

13.3. The analysis of socioeconomic scores is simplified for those items 
with only two alternatives, for example, with or without ele(4-ricity. Sup¬ 
pose we want to correlate the scores on one such item with income. Let 
Wo be the score for each of the no families without this item and Wi the 
score for each of the Ui families with this item (no -Y ni = n total families). 
Show that is independent of Wo and Wu 

13.4. Girshick and Haavelmo have made an anal^^sis of the demand for 
food in the United States for the years 1922 to 1941.^® One equation in 
their analysis involved the relationship between disposable income 
adjusted for the cost of living (F) and investment per capita adjusted for 
the cost of living (Xi). The values of F and Xi are shown in the accom¬ 
panying table. 

Data for Exercise 13.4 


Y 

Xi 

Y 

Xi 

F 

Xi 

87.4 

92.9 

107.8 

142.9 

103.1 

114.3 

97.6 

142.9 

96.6 

92.9 

105.1 

121.4 

96.7 

100.0 

88.9 

97.6 

96.4 

78.6 

98.2 

123.8 

75.1 

52.4 

104.4 

109.5 

99.8 

111.9 

76.9 

40.5 

110.7 

128.6 

100.5 

121.4 

84.6 

64.3 

127.1 

238.1 

103.2 

107.1 

90.6 

78.6 

1,950.7 

2,159.7 
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(а) Set up a simple linear regression of Y on Xi, and determine the 
constants in the regression equation. 

(б) Set up the analysis of variance. 

(c) Make a test of significance of the usefulness of the regression equa¬ 
tion. Are there any aspects of these data which might invalidate this 
test? 

(d) Plot the data (rounded to nearest integer), and draw in the regres¬ 
sion line and the 95 per cent confidence lines for E(Y\Xi). From the 
nature of the residuals from the regression line, would you suggest any 
changes in the form of the regression equation? 

13.5. R. A. Fisher has compared the body weights (in kilograms) with 
the heart weights (in grams) of 47 female and 97 male cats.^^ The sums 
of squares and products were as follows: 



Degrees of freedom (Body)- 

(Body X heart) 

(Heart) 2 

Femiiles: 





Total 

47 

265.13 

1029.62 

4064.71 

Correction for mean 

1 

261.677 

1020.516 

3979.92 

Difference 

46 

3.453 

9.104 

84.79 

Males: 





Total 

97 

836.75 

3275.55 

13056.17 

Correction for mean 

1 

815.77 

3185.07 

124:35.70 

Difference 

96 

20.98 

90.48 

620.17 


(a) Determine the regression of heart weight on body weight for both 
males and females. 

(b) Are these two regressions different from one another? 

(c) Are the two error variances essentially the same? 

13.6. In a study of lobster population, D. B. DeLury^^ presents the 
following data on the catch per unit of effort for the time interval t, C{t), 
and the total catch up to t, K(t), in thousands of pounds: 


i 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

C 

.82 

.75 

.94 

.80 

.83 

.89 

.70 

.58 

.64 

.55 

.52 

.45 

.45 

.49 

.45 

.48 

.43 

K 

0 

7 

13 

16 

22 

25 

32 

37 

40 

45 

50 

53 

54 

55 

57 

60 

62 


(a) A linear equation of the form (7 = a + hK + e was set up. Deter¬ 
mine the values of a and h and their standard errors. 

(5) The total population at time = 0 is estimated by No = —a/h. 
Determine Nq, 
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13.7. Read a brief note in the December, 1948, issue of the American 
Statistician (pages 16 to 17) on the use of regression methods for business 
statistics. 

13.8. Suppose a sample of ni is used to estimate the parameters in the 
equation Yi + l^iEi + and a sample of n 2 for the equation 
1^2 = M 2 + ^ 2^2 + ^ 2 , where (t\ is assumed to equal u'l. How would you 
test the null hypothesis that /3] = What would you do if o-f ^ al? 

13.9. (a) Show that Y and h are the ML estimates of fx and /5, respec¬ 
tively. 

(6) What is the ML estimate of cr-? Is this estimate unbiased? 

13.10. Use the data in Table 13.1 for incomes less than $4,000 to fit a 
new regression of net income on socioeconomic score {Sy‘^ = 623 for these 
incomes under $4,000). 

13.11. As mentioned in Example 13.1, a logarithmic relationship was 
also used. Since there were some negative incomes, Z = log (F + 1) 


Data for Exercise 13.11 


X 

SZ 

S'Z 

X 

SZ 

S'Z 

39 

.5215 

.5215 

65 

4.4110 

4.4110 

41 

. 9755 

.9755 

66 

7.7002 

5.9932 

42 

3.8118 

3.8118 

67 

3.3000 

2.5018 

43 

.2217 

.2217 

68 

3.1725 

2.4361 

44 

2.2455 

2.2455 

69 

5.1685 

4.6628 

45 

13.8419 

13.8419 

70 

5.1154 

3.6538 

46 

.7811 

.7811 

71 

5.3964 

5.3964 

47 

9.0067 

8.2792 

72 

3.6718 

3.6718 

48 

22.2637 

22.2637 

73 

3.0164 

3.0164 

49 

3.3421 

3.3421 

74 

7.2008 

4.5188 

50 

12.7787 

12.7787 

75 

7.0135 

6.2004 

51 

11.1653 

11.1653 

76 

5.2177 

5.2177 

52 

9.6366 

9.6366 

77 

4.9095 

2.0622 

53 

18.1929 

18.1929 

78 

1.7054 

1.7054 

54 

16.1329 

16.1329 

79 

6.3518 

5.5146 

55 

15.4700 

15.4700 

80 

2.8142 

2.8142 

56 

14.7077 

14.7077 

81 

3.4868 

2.7624 

57 

7.9962 

7.1611 

82 

2.0615 

1.0923 

58 

9.8522 

9.8522 

S3 

3.2244 

.9187 

59 

9.8317 

9.8317 

84 

4.3589 

2.6452 

60 

6.3001 

5.5959 

85 

1.1115 

1.1415 

61 

9.3233 

8.3207 

88 

1.5469 

0 

62 

3.9073 

3.1950 

89 

1.2370 

0 

63 

8.2960 

8.2960 

91 

1.0531 

0 

64 

5.6077 

3.4258 







Total 

310.4883 

282.3832 
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was used as the dependent variate with X still the fixed variate (F in 
terms of thousands of dollars). The data for this regression analysis are 
presented in the table ((.he values of n and n' are not reproduced here, 
as they are the same as in Table 13.1). 

(a) Fit a new regression line using all incomes, but with Z = log (F + 1) 

as the dependent variate. = 32.279.) 

(5) Do the same for the incomes under $4,000. {Sz‘^ = 22.470.) 
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CHAPTER 14 


GENERAL REGRESSION MODEL WITH r FIXED VARIATES 

14.1. Introduction. Let us suppose that we can approximate Y by 
means of the general linear equation 

r r 

F = ju -f- ^ (3iXi + € = y + ^ hiXi + e, (1) 

i=l i=1 

where the regression coefficients {/^i} are to be determined from n(> r + 1) 
simultaneous observations on Y and the A"/s (xi = Xi — Xi). The first 
relationship represents the true experimental model in terms of the 
parameters (g and the /3,:) and the true error, e, while the second is in 
terms of the estimates of these parameters and the residual, e. The h’s 
are determined by minimizing the sum of the squared residuals. 

The usual assumptions are: 

(i) The {X/} are fixed variates and may be looked upon as population 
parameters. Often the X’s are chosen deliberately and the F’s are 
produced or chosen at random. 

(ii) For a fixed set of X’s, say i F’s associated with this set 

r 

are NID with mean E{Y') = Y and variance o-^. The observed 

iW 

i — \ 

regression surface is F' = F + '^hix\. As mentioned in Sec. 13.1, the 
assumption of normality is required only when confidence limits and tests 
of significance are used. 

(iii) For any set of X’s, the variance of F shall be the same; this is the 
assumption of homoscedasticity. 

Even though we are considering only one F for each set of A^’s, it is 
understood that there is an underlying normal population of F^s for each 
set of A^’s and that the residuals from the true regression surface are 
NID(0,0-2). assumption of fixed X’s indicates that the results cannot 

be applied to the entire multivariate } F,Xij population—only to the sets 
of X^s used in the analysis. 

The form of the general equation should be based on some theoretical 
framework, which a research man in the particular field of application 
should be asked to furnish. In many cases the particular set of fixed 
variates may not be the ideal ones from a theoretical point of view, but 
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they may be the best substitutes for which data are available. Also, the 
form of the regression model might not be ideal, but often the ease of 
computing from a linear model will outweigh the advantage of using a 
more exact but cumbersome model. Of course, if the nonlinear model is 
simply a multiplicative one, it can be linearized by use of a logarithmic 
transformation, provided the error is also multiplicative. 

In a graduate thesis, Monroe^ has presented some of the uses of non¬ 
linear models for nutritional experiments, plus an extensive bibliography 
on the subject. An article on this subject by Hartley^ outlines a method 
of using least-squares estimation for nonlinear parameters. 

The error sum of squares, which is to be minimized, is 

r 

SSE = = «(?/- 2 *’•*•)'- (2) 

1 = 1 

where we shall use E for summation over the sample values and 2 for 
summation of variates. The following general equation is obtained when 
SSE is minimized with respect to 6 a- {k, — 1,2, . . . ,r): 

hiSxkXi + h2SxkX2 + • • • + hkSxl -f- • • • + brSxkXr — Sxk'y. (3) 

In order to simplify the presentation which follows, let Sx^xj = aij and 
Sxiy = (ji, where = an. Then we can write the set of r equations in 
the r unknown [h^] as follows: 


aiihi + ai2b2 + ‘ * 

• + aikbk + * * 

• + air 6 r — Ql, 


« 2 i 6 i + <22262 + * * 

■ + a^khk + * ■ 

• + a^rhr = g2f 


ttfci 6 i + aA -262 + ‘ * 

• + Okkhk + ' * 

• + akrbr = gjc, 

(4) 

«ri 6 i + < 2^262 -b * * 

* + Orkhk + ‘ * 

' + arrhr = Qr- 



These are called the normal equations. 

This system of r linear equations can be solved for the {bi\ by the usual 
methods of simultaneous equations given in elementary algebra. Methods 
of solution have been presented by Snedecor® and in a special computing 
manual by Wallace and Snedecor.'^ However, we believe that, in general, 
it is better to solve for the 6’s by use of a method called moirix inversion. 
Computing tcchni(iues have been devised so that they can be followed 
without a knowledge of matrix theory. A detailed discussion of these 
techniques and the necessary matrix theory are presented by Dwyer.^ 
We shall present two computing techniques in Chap. 15. 

In order to determine the 6’s by a method of matrix inversion, an 
intermediate computing step is necessary, A new set of constants 








X i 




!c,:y) must be determined. Suppose we arrange the a’s and c’s in rows 
and columns as follows: 



an 

ai2 • 

• air 


Ci2 • 

('Ir 

A = 

a2i 

022 

• aor 

, C = 

C 21 C 22 

• C2r 


_ari 

ar2 ' ' 

a,rT_ 


_Crl Cr2 ' 

Crr_ 


These arrangements are called matrices^ and the individual abs and c\s are 
called elemeiits. The c’s must satisfy the following conditions: 



i = k. 
i 7^ k. 


(5) 


In other words, the sum of the products of corresponding elements in a 
row of A and a column of C must be unity for the same row and column 
(for example, the first row of A and the first column of C) and must be 
zero for an unlike row and colum?! (for example, the first row of A and the 
second column of C). The C matrix is called the inverse of the A matrix. 
The computing techniques mentioned above refer to the computations 
required to determine the c’s. 

After the c’s have been determined, the solution for hk is as follows: 


hk ^ ^ ^kjSxji/, k Ij 2, . . . , r. (6) 

.•/ = i y = i 

In other words, any hk can be computed by adding up the products of the 
elements in the kth row of C times the corresponding g^s. A brief dis¬ 
cussion of this matrix theory is presented in Sec. 14.2 for those readers 
who desire a more theoretical presentation. 

From the model equation (1), we see that 

y = Y - Y = 2^iXi -b 6 — €, 

where e = Se/n. Hence using the definition SxjXi = aji and equation (5), 

^ Ckj l3iXi + = Ckjaj^ + YfkA^Xje) 

j i i j J 

~ X 

J 

Since the [e] are NID(0,o-^), 

E(l)k) = (indicating hk is an unbiased estimate of (3k); 
a^{bk) = E[(hk — (3k)^] = £’[(2cA.y^.Tye)2] = c/c/^a^; 

(j(h ihk) ~ CikC^) 

(T^ibi — hk) = (Cii — 2cik + Ckk)<T^. 





In order to prove that a^{hk) = Ca-ao'^ we make use of the linear-form 

techniques of Part I. ^ CkjSxje = I can be represented as a linear 

j 

function of the n e’s, that is, 

n 

I = S Wpep, 

p = i 

where Wp = CkjXjp and Xjp is the pth observation on Xj. Since the e are 
NID(0,0-2), ^ write 


since 


r r 

swi = y ckjxjp^ ^ y cicrxfp'^'^ 


yy C’kky 


I 


Ckj'dj'j — 


when h = j, 
when k 9 ^ j. 


A similar proof can be set up for o-(6,:?)*)• 

It should be noted that since the e are assumed NID(0,o-2), the {bi} are 
multivariate normally distributed with means {(3i], variances Caa^j and 
covariances Cija'. 

The error sum of squares, as given in equation (2), is 


SSE = .S' (?/ - ^ hiX.y 

r r r 

= Sjf — ( y hSxiy'^ + y bi ^ y bkSxiXk — Sxiy^ 


T 


1 = 1 ^ = 1 


= 2 hSxi>J), 

i = l 

because the values in parentheses of the second equation are simply the 

r 

normal equations, where Y hkSxiXk = Sxiy. Hence the reduction due to 

k^i 

regression is 

r 

SSR = ^ hSxiV = R^Sy^, 


i = l 


where R is called the multiple correlation coefficient. 



In Sec. 14.3, we shall prove thatf 


£^(SSE) = (n - r - = SSE/(n - r - 1), 

£^(SSR) = rcr^ when ail — 0. 

Hence F = SSR/rs^ can be used to test the null hypothesis that all = 0. 
F has (r, n — r — 1) degrees of freedom. Also, SSR > when some 
^ 0 . 

If it is desired to know if the last (r — k) of the r fixed variates made a 
significant contribution to SSR, we can obtain the reduction due to the 
first k fixed variates by using 

? = F + h'lXi -[-••• -f- h'^Xk. 

This reduction will be called SSR*. Then the added reduction due to the 
last (r — k) fixed variates is (SSR — SSR*). The expected value of 
(SSR — SSR*) is a function of only the last (r — k) jS^s. Hence we can 
test the null hypothesis that each of these (r — k) (3’s vanishes without 
saying anything about the first k fixed variates. The analysis of variance 
is: 


Source of variation 

Degrees of 
freedom 

Mean square 

A'(MS) 

First k fixed variates 

k 



Added reduction by last 




(r — k) fixed variates 

r — k 

(SSR - SSR*)/(r - k) 

. . . ,/34t 

Error 

n — r — 1 


cr^ 


t 0 is a function of only l3k+2, ■ . . , i3r}. 


hinder the null hypothesis Hq\ {/3*+i = /3*+2 = • • • = /3r = Oj, 0 = 0. 
Hence 

SSR - SSR* 
s^(r - k) 

with (r — k) and (n — r — 1) degrees of freedom. 

Exercises 14.1 to 14.10 pertain to Sec. 14.1. 

14.2. Matrix Algebra. J We shall digress here in order that the reader 
who is unfamiliar with the methods of matrix algebra may become 
acquainted with the necessary concepts and techniques used in simplify¬ 
ing the theory and computations of regression. 

t §2 is denoted as in most discussions of regression analysis. 

X The reader may omit this section if he does not want a more theoretical discussion 
than Sec, 14.1. 




A matrix is an array of quantities and may be represented as follows: 


ail 

ai2 

• ain 

^21 

a22 

a2n 

_am\ 

am2 • • 

aynn__ 


The quantities are called the elements of the matrix. This is a matrix 
of m rows and n columns. If m = n, the matrix is called a square matrix. 
The number of rows or columns of the square matrix is called the order of 
the matrix. If Uij ~ aji, the matrix is called a symmetric matrix. Two 
matrices are equal if and only if corresponding elements are identical. 
For the most part we shall be concerned in regression analyses with square 
symmetric matrices. 

Two matrices are added in the following manner: 


Uii 

a2i 


ai2 

an 


+ 


hii 

&21 



an + 6ii ai2 + hi2 
aoi + 621 ^22 + 1)22 


Subtraction is defined in a similar fashion. 

In multiplying two matrices, the elements of the product matrix are 
obtained by multiplying the elements of a row of the first matrix by the 
corresponding elements of a column of the second matrix and adding 
these product terms. For example, 


ail ai2 


^>11 i>u 


aiibn "b ai2h2i anbn + ai2?>22 

_a2i a22_ 


_?>21 ?^22_ 


_a2ihii "b 0-22^21 a2ibi2 ~h a2A>22_ 


If we let A stand for the first matrix on the left above and B the second, 
it is obvious that A • B may not necessarily be equal to B • A. 

Division of one matrix by another is defined as an inverse operation of 
multiplication. Let 

A • =-- G. 


Then, multiplying the equation on the left by A~^ (the inverse of A), we 
find 

A-i • A • 5 - A~V. (7) 

Noav, A~i is defined to be a matrix such that 
A-^ • A = A • A-i = /, 

where I is called the identity matrix and plays the same role in matrix 
algebra that 1 plays in ordinary algebra. The identity matrix I in terms 
of its elements is 




1 1 




1 

0 • • • 

0 

1 • • • 

_0 

0 . . . 


Returning to equation (7), we see that 

B = A-^G. 

We can now define the operation of dividing matrix G by matrix A as 
yielding the matrix B obtained by multiplying matrix G on the left by 
the inverse matrix of A. 

In order to explain a general method of obtaining the inverse of a 
matrix, we need to recall the definition and basic properties of determinants. 
The elements of a square matrix determine the determinant of the matrix. 
A determinant is a function of the elements of a square array. It may be 
expressed as a polynomial by expanding the determinant by minors 
according to Cramer’s rule. 

The order of a determinant is its number of rows or columns. A minor 
of an element of a determinant is the determinant of one less order found 
by striking out the row and column containing the particular element. 
The cofactor Aij of the element is obtained by multiplying ( —1)''+^' by 
the minor of a-ij. Cramer’s rule permits us to evaluate a determinant by 
multiplying the elements of any row or column by the corresponding 
cofactors and summing these products. 

The element of the inverse of the matrix A is obtained by divid¬ 
ing the cofactor Aji by the determinant |A| of the matrix, that is, 

aii = 

|A| 

Example 14.1. In order to find the inverse A~^ of the matrix 

"0 3 2“ 

A = 1 2 4 , 

3 0 2_ 

we evaluate the determinant of this matrix: 


21+H3 oi 

= 0 - 3(2 - 12) + 2(0 - 6) = 18 



jzinj±jLj miJuihLi 


Hi: 


The cofactors are 

* -4, 

^.2 = (-1)*+^|J 2 = 

and so forth. We find the complete matrix of the cofactors to be 

4 10 -6“ 

Aij = -6 -6 9 • 

8 2 -3_ 

Upon interchanging rows and columns and dividing each element by \A\, 
we obtain the inverse matrix 

4_ _ 6_ 8 " 

18 18 T8 

1 ^ 6 _ 2 . 

181818 
6 9 3 

T8 T8 “TsJ 

We may verify numerically that 

"l 0 o' 

A . A-i = ^-1 •A=I= 0 1 0 • 

_0 0 I_ 

For example, the first element of the I matrix is obtained as 
0-(A)+3-(i-|) +2-(-A) = F 
Some of the results of this section will now be used to simplify and 
develop the theory of Sec. 14.1. The normal equations of Sec. 14.1 may 
be written in matrix form as 



where 


and 



(4b 



i i o 


AKEIS AJSALYbib 


The matrix A is a square symmetric matrix, while the matrices B and G 
are single-column matrices. 

Upon multiplying equation (40 on the left by A 0 we find 

B = A-^G, 

/. B = A-^G. 


Cl2 ’ ' 

• • Cl, 

^22 * ■ 

• Gs, 

Cr2 

Cri 


Gil 

Gi2 • ' 

• • Clr 

G 21 

G 22 ■ ' 

■ • G2r 

Ga-1 

GA2 

• * Ckr 

C,1 

G7-2 ■ ' 

Crr, 


Hence, since two matrices are equal if and only if corresponding elements 
are equal, we see that 


+ CkrSXrV 


bk = CkiSxiy 4- Ck^Sxiy + 
^ CkjSxjy, 


which is the same as the results of equation (6). 

Instead of inverting the matrix A directly, as illustrated earlier, in 
order to obtain the elements Cij oi C = is more convenient to pro¬ 

ceed indirectly. We know that 

A • A-' = / 
or 

A • C = /. 

Writing the elements in for each matrix, we have 


Sx\ 8x1X2 • ' 

• 8X\Xr 

^S:i:i.T2 • ■ 

• 8X2Xr 

_aSTiX,. SX2Xr ' ' 

' • 8x^, 


Gll Gi2 • ■ 

Clr 

G 2 I G 22 * ' 

' C2r 

_Grl Cr2 

Crr, 


1 

0 ' 

• 0 

0 

1 • 

• 0 

_0 

0 • 

' ' 1 . 


Again, since two matrices are equal if and only if the corresponding ele¬ 
ments are equal, we can immediately write down r sets of equations. For 
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example, the first set may be obtained by multiplying all the rows of the 
A matrix by the first column of the C matrix and setting these sums of 
products in turn equal to the elements of the first column of the identity 
matrix. We obtain 

Cii*S:r^ + C21SX1X2 + ' * • + CrlSXiXr — 1 , 

CiiSxiX2 + C2lSx\ + • • • + CrlSXiXr = 0 , 


Cii/S-ria-r + + • • • + CrlSxl — 0. 

In a similar fashion the other (r — 1) sets of equations may be obtained. 
The solution of these sets of equations enables us to find the elements of 
the inverse matrix C = A-K Short methods for obtaining simultaneous 
solutions to all r sets of equations will be presented in Chap. 15. It should 
be noted that dj = Cyy because of the symmetry of matrix A. 

14.3. Theory of Tests of Significance. t For references to the theory 
of tests of significance with regression problems see Bartlett,« Yates,^ and 
R. A. Fisher. 8 

The equation F = m + + e = f + + c can be replaced by 

a new equation 

F = M + + e = F + + e, 

where {zil are functions of I®,! so constructed that \zi} are completely 
orthogonal variates. As before, the 6 are NID(0,<r^). Hence the F’s are 
NID(m + 2/3'0„ 0-2). We might write 

= WiiXi, 

Z 2 — 1X21X1 + 1X22X2, 


Zr = WrlXi + Wr2X2 + ’ * * + WrrXr, 

where iSsy = 1 and S{ziZj) = 0 {i 9 ^ j). Hence 

7 = /A + '^^iXi + € ^ M + e, 

Avhere {.^i} are solved backward in terms of [zi]. The least-squares esti¬ 
mate of is h'i = SziV = SziY, because of the orthogonal relationships. 
Also, 

r 

SSE = Siv - S6'2<)2 = Si/ - 2 

i = l 

Hence is reduction in sum of squares due to the regression. 

t This section can be omitted if the reader is not interested in a more theoret.ical 
presentation than in Sec. 14.1. 
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Before continuing, we should show that our solution to the least-squares 
equations is unique. We know that 


F = f + ^ = F + 2 


v'hcre h'/ = ^ For example, 


h\' = Wiibi -}- 'ir‘21^2 “k ‘ 

But h'/ = because both were derived by minimizing There can 

be but one minimum because the Se^ equation is quadratic in the 5\s. 
Hence 

Xih'y = XhiSxiy, 

We shall use the orthogonalized regression coefficients {5Jj in the theory 
which follows, always remembering that any Xj is a function of only 

Sl, 22, , Zj. 

Because of the relationship 

Sz<z, = I O ^ -i’ 

I 1 i 

r r 

Szi{Y - ju - ^ ^'Zj) = Szi€ = SziY - ^ (i'jSziZj = V- - ftk 

j = i y = 1 

Hence 

{h’ - /?') = 

a completely orthogonal form in the e, which are NID(0,cr-). From our 
theory on orthogonal forms, we know that 

E{h' - A') = Sz,E{e) = 0 or E{h') = /3', 

= mi - m = szy = <t\ 

E[{b' - ^'yh' - 0, i^j. 

Since the € are NID, so are — fi). Also, t = (h' — id')/s, where 
E(s^) = o-k Or in terms of the original variates (Xi) 

t = (bi ~ do)/s wfi, f = 1, 2, » . . , n 

Next we need to determine sk 

SSE = s(y-2 ^'iQ' = [ 2 / - ^ - 2 

i i i 

= sQ- ^ - &',Y = «(* -^y-2 ~ 

F(SSE) = (re - l)<r= - rcr^ = (n - ?• - IV. 

Hence if we let s- = SSE/(re — re — 1), i?(V = 
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If it is desired to make a test of the hypothesis {/3.- = Oj, it would be 
useful to be able to use the F ratio as a test criterion. In order to use F, 
it is necessary to find two quantities distributed as such that the 
ratio of the two will test the hypothesis {(S'- = Oj against the alternative 
{jS' Oj. It would appear from the above that SSE is distributed as 
xV“ with (n — r — 1) degrees of freedom. A more rigorous method of 
proof is the following: 

Augment the existing set of r completely orthogonal variates [zi] by 
(n — r — 1) others, which we shall designate as [pj] (j = 1,2, . . . , 
n — r — 1). The estimation equation will then be 

r 

F = M + J + e = F + 2 biz, + 2 clVi, 

? = 1 i j 

where c' is the regression coefficient for PjfA'(l'c'pj) = 0]. Note that with 
these 11 orthogonal variates, there is no residual sum of sc^uares. Hence 

^[y - = 0 . 

0 = S[y - - X(b' - ^')z, - Xc',PjY 

= s [y ~ 2/3'^.]' - ^ - oy - X 

i % j 

or 

n—r—l r 

2 yy = S(e - ly - J (6' - 0'y = SSE. 

y =1 i-I 

Hence we have broken SSE into (n — r—l) orthogonal squares. Fur¬ 
thermore, 

c'j = S(ypj) = S(epj) + 

i 

E [c] - J ] = 0. E{c':) = 2 ^'yyp>) = 0. 

<T\c'y = E[{s,py] = <^^ cyy =0, j. 

a(c;6') == E[y(bl - /3')] = E[.S(ep,)S(«,)] -= 0, 

Since the {c'} are orthogonal linear forms in NID(0,o-2) variates, they 
are Nil)( 0 , 0 -^). Hence the are independently distributed as x^or^ 
with 1 degree of freedom each, or SSE is so distributed Avith (n — r — 1) 
degrees of freedom, t 

We have already shoAAm that the (5' — /3,') are NID(0,(r2); hence, 
(5' — is distributed as x^^ with 1 degree of freedom, and 2(6')^ Avill 
be distributed as xV^ with r degrees of freedom, under the null hypothesis 
{dj = Oj.f Hence 

t Since } and [h'^] are independent of one another, SSE and SSR are independent 
of one another, This is a necessary condition for the statements concerning F, 
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r / n — r — 1 

can be used to test the null hypothesis. Also, we see that the single-tailed 
F test should be used if the alternative hypothesis is 7 ^ 0),because 
the expected value of the denominator is still while the expected value 
of the numerator is 

2 - m + mm - m + Emr]]/r 

i i 

k* + (m/r = <^^ + 2 (0:y/r > 

i i 

There are many conditions under which it would be desirable to assume 
several of the /3' 0. For a general test, we shall assume , 

7 ^ Oj and test the null hypothesis that {/5[+i, . . . , /5' = 0). 

We have shown that SSE, found by fitting {6'j for f = 1, 2, . . . , r, 
is independent of the hypotheses for {j3'-] and that the (&' — are 
independently distributed as with 1 degree of freedom each. Hence 
under the null hypothesis that {/3^+i, . . . , (S' = 0), 

i = k+l 

is distributed as with (r — k) degrees of freedom. We know that the 
reduction in the residual sum of s(piarcs due to fitting r {6' j is given by 

and that due to fitting the first k {6') is 

i 

i = l 

Hence 

I (&o^ 

i = k + l 

is the additional reduction in sum of squares gained by using . . . ,6'} 

in the regression equation after first using {h[, . . . ,6[j. It should be 
emphasized that, by use of the orthogonal transformation, every 6' can be 
computed independently of all the others. Flence the reduction due to 
any one of the regression coefficients is independent of whether any others 
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have been used and is independent of any assumptions made about the 
expected values of these others. 

The major difficulty facing us at this stage is to transfer these results on 
the additional reduction due to the last (r — k) fixed variates back to the 
original {bi,^i} setup, where the value of any regression coefficient depends 
upon all the others included in the analysis. For example, the values of 
{bi,h 2 , . . . as estimated from all the r [x^] will not be the same as 
the {h[',b 2 , . . . fb'jJ} as estimated from the first{.r^}. In order to use 
our orthogonal results, the following conditions must hold: 

(1) If we let Rr be the reduction in the residual sum of squares due to 
the first r {xi} and Rk that due to the first k {rcj}, then (Rr — Rk) must 
be distributed as with (r — k) degrees of freedom under the null 
hypothesis {^k+i, ^k+ 2 , . . . , I3r = 0}. 

(2) The residual sum of squares S]R — Rr is independently distributed 
as with (n — r — 1) degrees of freedom. 

We have shown that 

k 

i = 1 

is distributed as with k degrees of freedom, 


r 

y (fc' - 

i = Y +1 

as x^<r^ with (r — k) degrees of freedom, and 


SSE 


f r 

su -iY- 2 = sy^-2 

t = 1 i=l 


as x^o-^ with (n — r — 1) degrees of freedom—all independent of one 
another. 

But we have also shown that both methods of deriving the regression 
coefficients minimize Se} and hence produce the same SSE. Hence 

;S?/2 - Rr = SSE 

is distributed as x^o*^ with (n — r — 1) degrees of freedom. Also 


Rr 


2 ■ 

1 = 1 


If we solve backward for the {.vd in terms of the {zi}, we find that 
Xi = w-^Zi + Wi2Z2 + • • • + w'iiZi, 



where w'j are functions of the Wij. For example, 

= ^/Wll, w'^i = --(^V2l/WnW22), W 22 = l/tV22- 

Since ^l3jXj = and by equating the coefficients of [zi], we find that 


ft' = ^ 


This shows that the null hypothesis {(3k+i, . . . , = 0} means that 

. . . ,/5' = 0} because theare functions of only {jdj,/3y+i, . . . ,drl. 

r 

Hence ^ {h'-)- is distributed as with (r — k) degrees of freedom 

Z = ^ -1- 1 

under the null hypothesis for the jdi}. 

Now Rk is the reduction due to the first k . But the first k [zi] are 
functions of only the first k {xi}. For example, 

k 

Zk = ^ WkjXj. 

Again the reduction due to using the {a:,} or {zi] must be the same, so that 

k r 

Rk = (b'-)-. But since Rr = ^ (b-)y the added reduction after the 


first k {xi} is given by 


r 

J (6')^ 


It might be added that the \zi] behave just like orthogonal polynomials, 
which will be discussed in Chap. 16. 

14.4. The Regression Problem When Certain Assumptions Are 
Relaxed. Suppose we relax the assumptions of normality and homo- 
scedasticity in the previous sections of this chapter. For this case, David 
and Ncyman^ give a proof of the following theorem on least squares. 
Given 

(i) Y 1 , F 2 , . . . , Fn are independent. 

T 

(ii) E{Yj) = ^ ^iXij and the A'^s are fixed variates.f 

i = 1 

(iii) Out of the n equations in (ii), it is possible to select at least one 
system of r equations soluble with respect to the (3’s, 

(iv) The variances of the Yj satisfy the relationship 

<r| = (T^/Wj, j = 1, 2, ... ,n, 

t If it is desired to use the intercept a in the equation, set Xij = 1. 


muurjLj 


j oo 


where cr^ may be unknown but the {wj} are known positive constants > 0. 
Then 

(a) The best linear estimate of Y is 

? = 

where the b’s are obtained by minimizing the weighted sum of squares 



where Xi - 1, X 2 = X, = SwY/Sw, and X^ == S%vX/Sw. Also, 
h = Swxy/Swx"^, where x — X — X^^ and y — Y — Y^, and Xxo and Yy, 
are called weighted means and b a weighted regression coefficient. 
Often the weights are adjusted so that Sw — 1. 

Example 14.2. The following data are taken from an experiment on 
soybeans with three nitrogen treatments plus a check treatment (no 
nitrogen) to find out the relation between the mean yield of soybeans 
per acre (F) and amount of nitrogen in pounds per acre (X): 


Lb N per acre (A'^) 

0 

47 

94 

157 

Bu beans per acre (F) 

14.7 

14.6 

17.8 

22.1 

No. of plots 

1.4 

7 

7 

7 

w 

2 

1 

1 

1 
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Since 14 plots were used for the 0 level of nitrogen and 7 plots for the 
three treatments, the variance of the mean yield for the 0 nitrogen plots 
was one-half that of the mean yields for the three treatments; so weights 
of (2,1,1,1) were used in estimating the regression of mean yield on 
amount of nitrogen per acre. 

The calculations needed were 

= 59.60, - 16.78, 

Swx^ = 17,933.20, Swxy = 828.66, Swy^ = 42.75, 
h = 0.0462, 

f = 16.78 + 0.0462(Z - 59.60). 

In this case the estimate of variance was taken from the 13 + 3 (6) =31 
degrees of freedom within treatments: — 7.56. Hence the estimated 
variance of each of the F^s, which were means of 7iv plots, was 

7.56 ^ 1.08 

7w w 

See Exercise 14.12 for another estimate of the variance, using these 31 
degrees of freedom plus the 2 degrees of freedom for deviations from 
regression. 


Table 14.1 

Estimates of cr“(F) for Example 13.1 f 




1 All incomes 

Income < $4,000 






1 






i X 

Sn 

s\Y) 

w 

X 

Sn 


w 

1 

39-44 

42.3 

36 

.366 

2.732 

42.3 

36 

.366 

2.732 

2 

45-46 

45.1 

54 

.457 

2.188 

45.1 

54 

.457 

2.188 

3 

47-48 

1 47.7 

120 

.431 

2.320 

47.7 

119 

.332 

3.012 

4 

49-50 

49.9 

68 

.314 

3.185 

49.9 

68 

.314 

3.185 

5 

51-52 

51.4 

70 

.545 

1.835 

51.4 

70 

.545 

1.835 

6 

53-54 

53.4 

108 

.587 

1.704 

53.4 

108 

.587 

1.704 

7 

55-56 

1 55.4 

91 

.435 

2.299 

55.4 

91 

.435 

2.299 

8 

57-60 

58.4 

95 

1.111 

0.900 

58.4 

93 

.837 

1.195 

9 

61-64 

I 62.3 

73 

1.807 

0.553 

62.3 

68 

.678 

1.475 

10 

65-70 

67.4 

68 

1.845 

0.542 

67.3 

61 

.623 

1.605 

11 

71-76 

73.6 

67 

2.727 

0.367 

73.5 

63 

1.063 

0.941 

12 

77-91 

80.8 

59 

8.254 

0.121 

80.1 

45 

.894 

1.119 

Total 

909 

876 


tcr2(F) measures the variation of net farm incomes about their means. Sn is 
the number of farmers for each class (summing the values of n in Table 13,1). The 
number of degrees of freedom for each §^(7) ig less the number of X’s for the class 
(total of 860 and 830, respectively). — 1/5®(F). 
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Example 14.3, It appeared from the graph of the data of Example 13.1 
(Fig. 13.3) that the variability about the regression line tended to increase 
with increasing X. In order to investigate this matter, the data were 
grouped into 12 classes, and a separate estimate of variance was computed 
for each group. This was done both for all the income data and for that 
portion of the data after eliminating all incomes of over $4,000. These 
estimates of variance are presented in Table 14.1. 

In Table 14.1, s^(Y) was computed by adding together 

Sy^ = S(Y - F)2 

for each X in a class and dividing this sum by the total degrees of freedom 
for the group. For example, with the first group (X from 39 to 44), 
we had the following distribution: 


X 

n 

d.f. 

Sy^ 


39 

3 

2 

1.213 

.607 

41 

3 

2 

.341 

.170 

42 

18 

17 

8.325 

.490 

43 

1 

0 

0 


44 

11 

10 

1.475 

.148 

Total 

36 

31 

11.354 

.366 


The data were grouped like this in order to smooth out the estimates of 
0 - 2 ( 7 ) by increasing the number of degrees of freedom for each estimate. 

A weighted regression of 7 on X can then be computed using as weights 
the values of w given in Table 14.1; for example, any observation with 
X to 39 to 44 will receive a weight of 2.73. These weights are applied 
to the data presented in Table 13.1. Since the same weights are used 
for many observations, a short-cut computing procedure can be used as 
follows: 

(i) Compute for each class: S{nX), S(SY), S(nX‘^), S^[X(;S7)], and 
5[(*S7)V^]. It might be easier to compute 7 for each class and make 
the last computation as /S[7(*S7)]. 

(ii) Multiply each sum in (i) by the corresponding weight, w, in Table 
14.1, and sum these weighted values over the 12 classes, for example, 
Siw) = S[wiSn)l S(wX) = iS[w^(5nX)], S{wX^) = >S[^r(SnX2)]. 

(hi) Adjust the sums in (ii) for the means, for example, 

S(wx^) = SiwX^) - 

(iv) h = S{ivxy)/Siwx"^); SSR = (Swxy)'^/S{wx‘^). 
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There is some question as to the proper error term to use in this case, 
because the weights were derived from the data. The simplest procedure 
is to use 

SSE = - SSR, 

with N ~ 2 degrees of freedom {N = 909 or 876). In this case 

s{wi/) = (« - to ) + s[TOS(?,sr)] - 

where m is the number of X's in Table 13.1 (m = 49 or 46) and the sums 
are taken from step (ii) above. In this case = SSE/(iV — 2), and 
F = SSR/s^ can be used to test for the significance of the regression of 
Y on X. In Sec. 25.2, we shall consider another estimate of the error 
variance. 

The sums in (i) for each class and the weighted sums (ii) over all 
classes are presented in Table 14.2. 


Table 14.2 


Class 

iS(nZ) 


S(:nX^) 

S{XSY) 

S{YSY) 

w 

1 

1,523 

26.2 

64,503 

1,106 

19.7 

2.732 

2 

2,433 

52.5 

109,623 

2,365 

51.1 

2.188 

3 

5,726 

110.2 

273,250 

5,258 

101.2 

2.320 

4 

3,383 

55.0 

168,317 

2,739 

45.2 ' 

3.185 

5 

3,601 

76.2 

185,263 

3,922 

83.3 

1.835 

6 

5,769 

129.7 

308,187 

6,939 

159.9 

1.704 

7 

5,044 

113.1 

279,604 

6,278 

144.6 

2.299 

8 

5,546 

139.3 

323,872 

8,139 

206.6 

.900 

9 

4,548 

118.9 

283,438 

7,423 

204.1 

.553 

10 

4,580 

130.4 

308,690 

8,796 

252.3 

.542 

11 

4,930 

152.8 

362,964 

11,270 

380.7 

.367 

12 

4,769 

211.7 

386,097 

17,399 

1,112.6 

.121 

Sum t 

73,953 

1,581.6 

3,889,486 

85,013 

2,012.0 



t Weighted by weights, w> SiivSn) — 1,427.63. 


The sums adjusted for the mean are 

S{wx‘^) = 3,889,486 - 3,830,857 - 58,629, 
S{wxy) = 85,013 - 81,929 = 3,084, 

Siwy^) = 860 + 2,012 - 1,752 = 1,120. 

Hence 

h = F{wxy)/S{wx^) = 0.05260, 

SSR = {SivxyY/Siwx^^) = 162, 

SSE = S{wy^Y - SSR = 958, 

= SSE/907 = 1.056, 
f = 1.108 + 0.0526(A - 51.80). 





EXERCISES t 

14.1. Shew that the variance of Y is a^/n and that Y is independent of 
each of the hi. 

14.2. (a) If we use the model, F = a + show that 

a ^ Y - 26,X,. 

( 6 ) What is the variance of a? 

14.3. What changes should be made in the regression analysis if 


E{Y) = y ^,X,? 


14.4. Given 


the 


regression model Y — fx 


+ y ^iXi +€ = f”+ 

i = l 


Show that 

(a) e/is uncorrolatcd with any of the Xi, that is, SxiC = 0 (for any i). 
(h) S fe = 0. 

(c) Se^ = SSE = SY’- - SfK 

14.5. How could you test the hypothesis that = /3y? 

14.6. Set up the model and normal equations for r = 2 . 

(a) Show that 


Cii = Sx-l/DSxl, Ci2 = C21 == — SxiX2/DSxI, C22 = l/D, 


where D = /Sa:| -- (S^rri.r 2 ) V^^i- 
( 6 ) Show that 

61 = / 3 i + CiiSxie + CiiSx^e. 


Set up the same equation for 62 . Using these results, prove algebraically 
that 

a-2(6i) = Cna^, (r‘^(h2) == C22<x^ cr(bih2) = Ci2(T^. 


(c) Prove that 61 and 62 can be estimated in a two-stage estimation 
procedure: 

(i) ij = 6'irri + Ci] X 2 = 6;,.Ti -f 

(ii) Cl = 6063 ; + 62 ~ 62 ( 0:2 — hxXi) + e 2 . 

Determine 61 , bx, and 62 by least squares, and show that 


y « 6 ^X 1 + 62 ^ 0:2 - 0 : 1 ^ + 


C 2 


and 


hi = 6 'i — 62 


>S ^_2 
~Sx\' 

t Exercise 14.6 should be worked by everyone. 
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Hence the regression of y on a:i and X 2 can be regarded as the regression 
on Xi (with regression coefficient h[) and on X 2 adjusted for .i;! (with regres¬ 
sion coefficient 62 ), where x^ — { 8 x 1 X 2 /Sxl)xi is the value of X 2 adjusted 
for .Ti. 

(d) Using the results in (c), show that 


SSR = 

(e) Finally show that 


(>S^Xi2/)2 {Se^yr 

Sxl Sel 


zl + 


E{zl) = 


/ 3 i + ^2 


SxiX‘^ 

~S^ 


Sxl + ( 7 ^, E{zl) = D^l + 


F(SSR) = dlSxl + 2 ^ 1 ^ 28 x 1 X 2 + ^ISxl + 20 - 2 , £'(SSE) = (n - S)(t\ 


Set up the analysis-of-variance table to summarize these results. 

(/) How would you test the null hypothesis that ^2 = 0? (3i = ^2 — 0? 
(g) Can you set up a two stage estimating procedure to test the null 
hypothesis that (3i = 0? How would you fit these results into the analy- 
sis-of-variance table? 

{h) What are the variances of 8 ' and o! for predicting Y from Xi = 

X 2 == X'? 

14.7. Given F = g + ^ 1 X 1 + ^ 2 X 2 + €, but we estimate only 

? = f + h[xi. 


Show that is a biased estimate of /3i and that S“ is also a biased estimate 
of cr^ 

14.8. Given F = g + ^[xi + e, but we estimate with 
^ — Y hiXi “t~ h 2 X 2 ‘ 


Determine whether or not hi and are unbiased estimates of and 
respectively. 

14.9. In the analysis of the data given in Exercise 13.4, it was thought 
advisable to use as a second fixed variate (X 2 ) the value of F for the pre¬ 
vious year: 77.4, 87.4, 97,6, . . . , 110.7. Use the results in Exercise 
14.6 to: 

(а) Estimate the two regression coefficients and their standard errors, 

( б ) Set up the complete analysis of variance, as in (e) and (g). 

(c) Estimate the average F for X^ = 100 and Xg = 100 and the 
standard error of this estimate. 

14.10. Show that the normal equations are also the maximum-likelihood 
equations for estimating the jS’s. 

14.11. Use the orthogonal transformations of Sec. 14.3 for r = 5. 
Note that 0 -^( 5 ') = and o-(5'6') = 0. Show that 
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(а) 6 b = 1 ^ 5,565 and 64 = Wi,ib'^ + '*^ 5 , 465 . 

( б ) The added reduction in residual sum of squares due to the last two 
fixed variates (X 4 and Xb) over the reduction due to the first three fixed 
variates (Xi, X 2 , and X 3 ) alone is given by 

wlJ)l — 2 w^o,AW^,f,b.ih^ + (iC 4,4 + ^>1,4)6! ^ C &.&61 — 204,56465 + C 4 , 46 | 

^4^4'i^5,5 C4,4C5.6 “ 

14.12. (a) Show that, for r = 2, 

SSE„, = Swy^ - -^1^; = SSE„/(n - 2). 

What would be the value of SSEt^, for Example 14.2? 

( 6 ) Also show that cr 2 (}X,) = a^/Sw, o’^( 6 ) = (t‘^/Swx^, = 


(X' - XJ. 

(c) In Example 14.3, determine 95 per cent confidence limits for (S 
and for E{Y') and Y' when X' = 80. What weight do you use for 
X' = 80? 

14.13. Use the data given in Tables 13.1 and 14.1 for incomes less 
than $4,000 to compute a weighted regression of Y on X. 

14.14. G. A. Baker^'^ furnishes an example on ovarian weights (in 
milligrams) of rats receiving five dosages of Serum VII, 16 rats for each 
dosage. 

Dosage in rat units (X) 

Mean weight (F) 

6'HF) 

Smoothed weights {w) 

The ^‘smoothed weights” were found by smoothing the values of s(F). 

(а) Show that, if we use the ‘‘smoothed weights,” 

f = -15.13 + 69.43 logio X, 

where logic X is used as the independent variate. 

( б ) Use as weights reciprocals of s^(F) to derive another estimate of F. 

14,16. It is evident from Table 14.1 that the variances increase system¬ 
atically with X. If there is a perfect correlation between X and o-(F), it 
can be shown that we should use a logarithmic relationship as was done 
in Exercise 13.11 (see reference 11). For all incomes, 5 (F) actually 
increases faster than X but not for incomes less than $4,000. The indi¬ 
vidual values of s‘^{Z) and s'‘^{Z) are given below for the 12 groups in 


4 6 8 12 16 

27.00 38.31 43.44 65.25 72.88 

1.59 3.96 3.64 18.96 45.95 

7.72 2.16 1.00 .37 .19 


2 A 

^ Sw Swx- 


and <7\e') = a* -, + 4 


Avhere 
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Table 14.1, where Z = log (F + 1) and s''^{Z) is used for the incomes 
under .*i>4,000. The vaiiances have been multiplied by 10. 


Group 

1 

2 

3 

4 5 

6 

7 

8 

9 

10 

11 

12 


.196 

.220 

.199 

.174 .188 

.196 

.152 

.296 

.434 

.329 

. 354 

.548 


.196 

.220 

.182 

.174 .188 

.196 

.152 

.296 

. 336 

.200 

.285 

.232 


(а) Use Bartlett’s test (Chap. 12) to test for unequal variances for 
s'^(Z). Is there a significant upward trend for these variances? 

(б) What do you conclude about s^(Z)? 

(c) How would you determine whether or not a log transformation 
would be a useful one? 
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CHAPTER 15 


COMPUTATIONAL METHODS AND METHODS OF ANALYSIS 
FOR A GENERAL REGRESSION MODEL 


15.1. Introduction. In general, if r is large, some short-cut procedure 
is needed to solve for the Cij values. Snedecor^ and R. A. Fisher^ discuss 
the use of the C matrix and the Doolittle method of obtaining it. Waugh 
and Dwyer^ and Dwyer^ present a variety of computational methods. 
Dwyer^ outlines the theoretical background for matrix inversion and 
presents numerous examples. 

We shall present an example for r = 4, using some data collected,by 
the Southern Cooperative Groups to estimate the quantity of vitamin 
B 2 in milligrams per gram for turnip greens (F) from a knowledge of 
the radiation in relative gram calories per square centimeter per minute 
during the preceding half day of sunlight (A"i), average soil moisture 
tension (A 2 ), air temperature in degrees Fahrenheit (AG), and the product 
(Xi • X 2 ) or (X^ 4 ). In all, 27 sets of observations were taken on these 
variates. In order to simplify the computations, these variates were 
coded as follows: X"i and A ^2 were divided by 100 , X:^ was divided by 10 , 
and X 4 was divided by 10,000. In general it is advisable to code the 
original data so that the aij (Sx^xj) are reduced to values between 1 and 
10, if possible. The coded data are presented in 4'able 15.1. 

The matrices of the sums of s(iuares and cross products are 


10.25767 .03798 6.87167 1.17904 

1.11550 -.06320 1.99828 
15.94667 .166956 

3.94418 


31.6650 


bi 

-86.8374 

; B = 

b 2 

46.7156 

b. 

_-152.8797_ 


b,i 


The elements of the A. matrix are represented as and the G matrix as 
Qi {SxiV), while the elements of the B matrix are the estimates of the 
population regression coefficients. 

Two methods of obtaining the inverse matrix will be presentexi, the 
Doolittle method and the abbreviated Doolittle method. The latter is often 
called the Gauss-Doolittle method. 
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Table 15.1 

Y Xx Xa X 3 X 4 


110.4 
102.8 
101.0 

108.4 
100.7 
100.3 
102.0 

93.7 

98.9 

96.6 

99.4 

96.2 
99.0 

88.4 

75.3 
92.0 

82.4 

77.1 
74.0 

65.7 

56.8 

62.1 
61.0 
53.2 

59.4 
58.7 
58.0 

Total 2,273.5 


1.76 

.070 

1.55 

.070 

2.73 

.070 

2.73 

.070 

2.56 

.070 

2.80 

.070 

2.80 

.070 

1.84 

.070 

2.16 

.070 

1.98 

.020 

.59 

.020 

.80 

.020 

.80 

.020 

1.05 

.020 

1.80 

.020 

1.80 

.020 

1.77 

.020 

2.30 

.020 

2.03 

.474 

1.91 

.474 

1.91 

.474 

1.91 

.474 

.76 

.474 

2.13 

.474 

2.13 

.474 

1.51 

.474 

2.05 

.474 


50.16 5.076 


7.8 

.123 

8.9 

.108 

8.9 

.191 

7.2 

.191 

8.4 

.179 

8.7 

.196 

7.4 

.196 

8.7 

.128 

8.8 

.151 

7.6 

.039 

6.5 

.011 

6.7 

.016 

6.2 

.016 

7.0 

.021 

7.3 

.036 

6.5 

.036 

7.6 

.035 

8.2 

.046 

7.6 

.962 

8.3 

.905 

8.2 

.905 

6.9 

.905 

7.4 

.360 

7.6 

1.009 

6.9 

1.009 

7.5 

.715 

7.6 

.971 

16.4 

9.460 


16.2. The Doolittle Method of Inverting a Matrix. This method is 
used by Fisher- and Snedecor.^ The computations are presented in 
Table 15.2 with accompanying explanations. 

The procedures followed in the computations in Table 15.2 were: 

I. The original A matrix on the left and an identity matrix on the right 
(zeros everywhere except Vs on the main diagonal) and a check column 
(sum of all elements in each row). In all of the computations which 
follow, the same procedure is followed with the check column. Then if 
the computing was done correctly, the sum of each row will equal the 
value in the check column. 

II. Divide each row by the element in the X 4 column of the A matrix. 
Be sure to carry at least five significant digits, and preferably six, in all 
quotients. Remember that the important point is the number of sig¬ 
nificant digits and not the number of decimal places. 


A matrix Identity matrix 
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III. Subtract the last line from all the others, dropping the Xi column 
in the left-hand matrix. 

IV. Divide by the elements of the right column in this new left-hand 
matrix. 

y. Again subtract the last line from the others and now drop the 
column. 

VI, VII. Same as above. 

VIIT. The first line gives the Cij values, which are computed by dividing 
the elements in (VII) by the left-hand number, —9.20326. 

IX. Substitute in turn the Cy values in cither line of the left-hand 
matrix of (VI) as follows to determine C 2 y. For example, 

[-2.01565 - (-14.08052)(.219015)1 
C 2 i = l or [=1.06819. 

i 0 - (-4.87726)(.219015) ) 

Note that should ecpial Ci 2 , except for rounding errors. 

[ 0 - (-14.08052)(1.06819)1 

C 22 = I or [ = 15.04066. 

[9.83081 - (-4.87726)(1.06819) J 

X. Next substitute the values of Cij and C 2 j in any of the ecpiations in 
(IV) and solve for Cay as in (IX) for C 2 j. For example 

.146590 - (1.45200)(.219015) - (-.081998)(1.06819) 

= -.083830. 

0 - (3.78493)(.219015) - (-.697558)(1.06819) 

= -.083832. 

0 - (.42798)(.219015) - (-.009272)(1.06819) 

= -.083830. 

We note that two of these ec[uations give the same result to five significant 
digits (six decimal places), while the middle one deviates very slightly 
from these two. We have used the value given in (VIII) for Cn. Round¬ 
ing errors are quite a problem in matrix-inversion calculations. Hence 
it is advisable to carry several unnecessai\y digits at first in order to be 
able to drop digits as the computing proceeds and end up with as many 
as was thought ncce.ssary. 

XI. Next substitute the values of Ciy, Coy, and Csy in any of the equations 
in (II), and solve for C 4 y. For example, 



.848148 - 8.70002c„ - .03221 Scji - 5.82819c:„ 

= -.003125. 

0 - .OlOOOOcii - .558230621 + .031027csi 

= -.608110. 

0 — 41.15857cii “h .378543c2i 95.5142.lc3i 

= -.603128. 

0 - .298932cn - .506640c2i - .04233IC 31 

= -.603111. 

In this case two equations are out of line, and two almost agree with Cu. 
In general it is a good practice to omit equations like the first and third, 
because several of the elements are so large. These large coefficients 
(such as 41 and 95 ) arc subject to a much greater rounding error, because 
extra digits are needed to give six-decimal accuracy. That is, we are 
trying to calculate C 41 to six decimal places; hence, the individual multi¬ 
pliers should be accurate to six decimal places. But for numbers like 
41 and 95 to have six decimal places, eight significant digits arc required, 
and we cannot obtain eight significant digits in our computing unless 
we use several more in the original A matrix. 

The C matrix has now been completed, but the computer should have 
some more checking besides the che(;k column. The final check is to 
find whether or not the product of the A and C matrices is the identity 
matrix. That is 

AC = D = I 

where I contains only I’s in the main diagonal. To compute an element 
in the fth row and jth column of AC, we find the sums of the products of 
corresponding elements in the fth row of A and the jth column of C. 
Let Gik be the elements in A and c^j those in C. Then the (ij) element in 
D is 


4 





We generally compute the diagonal elements da to see whether or not 
the}'' are nearly 1 (to the desired degree of accuracy). In the vitamin 
B 2 example 

dll = (10.25767) (.219015) ffi • • * + (1.17904) (-.003108) - 1.0000198, 
d22 = (.03798) (1.00819) + • • • + (1.99828) (-7.92605) = 1.0000380, 
d,, = 1.0000014, 
d 44 = 1.0000057. 

The last two diagonal elements are within the desired five-decimal-place 
accuracy, but du and d 22 only to four places. Hence if we use the present 
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C matrix, we cannot hope for more than four-place accuracy in the 
h’s, and possibly only three-place accuracy. 

XII. The sums of the cross products of the Xi with y are given here 

{gi = SxiV). 

4 

XIII. &i = ^ CijQj. For example, 

.7 = 1 

hi = (.219015) (31.6650) + • • • + (-.603108) (-152.8797) = 2.4631. 

At this stage it is advisable either to recompute the or to substitute 
these values in the original normal equations. For example, 

10.257675i + • • • + 1.1790454 = 31.6650. 


Substituting the above values of the hi, this sum is 31.6614, indicating 
slight inaccuracies. More nearly exact values are given by use of C" 
below. 

XVI. SSR = XhiQi = (2.4631)(31.6650) + • • • 

-b (-1.3769) (-152.8797) = 6,908.04. 

SSE = - SSR = 9,150.53 - 6,908.04 = 2,242.49. 

§2 = SSE/22 = 101.93. 

XIV. The standard error of hi = \/ciiS‘^. 

XV. t = |5,|/s(5,). 

If it is desired to improve the results to more significant figures, we can 
use an iterative device advanced by Hotelling.® The procedure is as 
follows: 

(1) Compute the matrix (2 — AC), which is the same as AC, except 
the diagonal elements are subtracted from 2 and the signs of all other 
elements are reversed. 


(2 - AC) == 


.9999802 
-.0000035 
-.0000033 
- .0000079 


-.0000771 

.9999620 

- .0000092 

- .0000778 


.0000044 

-.0000008 

.9999986 

-.0000013 


.0000195 

-.0000046 

-.0000124 

.9999943 


(2) Now compute C(2 — AC) = C", the neAV C matrix. 


C" 


.219012 

1.068180 

-.083828 

-.603104 


1.068180 

15.040626 

-0.317704 

-7.926049 


-.083828 

-.317704 

.095668 

.181971 


-.603104 

-7.926049 

.181971 

4.441777_ 


( 3 ) 


b'i' = [2.46332 -75.3747 


1.58369 


-1.37645]. 
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(4) Substituting in the normal equations, we obtain as estimates of 
the Qi 

[31.6649 -86.8375 46.7156 -152.8800], 

as compared with the exact values 

[31.6650 -86.8374 46.7156 -152.8797]. 

(5) If we use these values of 6-', we find that SSR = W/gi = 6,907.76 
as compared with 6,908.04 above. Hence there is no real difference in 
the results by using the hi. 

15.3. The Abbreviated Doolittle Method. Now we shall present a 
short-cut method of inverting a symmetric matrix (a^ = ayO, the 
abbreviated Doolittle method, described by Dwyer.^-^ 


Table 15.3 
Forward Solution 



Xi 0^2 ^3 ^4 

y 

Check 

I 

10.25767 .037980 6.87167 1.17904 

1.11550 -.063200 1.99828 

15.94667 .166956 

3.94418 

31.6650 

-86.8374 

46.7156 

-152.8797 

50.0114 

-83.7488 

69.6377 

-145.5912 

II t 

10.25767 .037980 6.87167 1.17904 

1 .00370260 .669906 .114942 

31.6650 

3.08696 

50.0114 

4.87551 

III t 

1.11536 -.088643 1.99391 

1 -.079475 1.78768 

-86.9546 

-77.9610 

-83.9340 

-75.2528 

ii^ i: 

11.33625 -.464424 

1 -.040968 

18.5923 

1.64007 

29.4641 

2.59910 

i: 

.22516 

1 

-.3104 

-1.3786 

-.0852 

-.3784 


Backward Solution (for c,,) 
cu “.219003 1.06806 -.0838254 ~ .603037“ 

6-2; 15.03894 -.317667 -7.92514 

czi .0056668 .181951 

c,jl 4.44129 . 

The procedures followed in the abbreviated Doolittle method computa¬ 
tions were; 

I. The original A matrix with only the upper right corner is reproduced 
plus the G column (Sxiy) and a check column. For the check column, 
we assume the entire A matrix is present. Since the A matrix is sym- 
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metrical, the lower corner is an image of the upper corner. We shall call 
these elements a,;/. 

II. Ai; — aiy, the first row^ of (I) is reproduced. 

By = AiyMn = Ai,/10.25767. 

III. A 2 j = a^j — [AriBy or A iyJ5i2], Avhcre a 2 j is the element in the 
second row of (I) and A 12 /^ii = AyB^i except for rounding errors. As 
we noted in commenting on rounding errors for the Doolittle method, it 
is advisable to choose from these tAvo {A^By or A yB 12 ) the one for which 
the twm members are more nearly ec|ual. 

A 22 = 1.11550 - (.037980)(.0037026) = 1.11536. 

^ _ _ i (.037980)(.669906) = -0.088643. 

rl23 .UbdwUU I ( 0037026)(6.87167) = -0.088643. 

Bnj = A2;/A22. 

IV. Ay = ay — {AisBy + A^sB^j) = ay (AijBi^ + A2;523). 

A33 = 15.94667 - [(6.87167)(.669906) + (-.088643)(-.079475)] 


= 11.33625. 


.166956 - 


(6.87167)(.114942) + (-.088643) (1.78768) 

= -0.464422. 

(1.17904)(.669906) + (-.079475) (1.99391) 

= -0.464424. 


By = Ay/Az^. 

V. Ay = a.y — {AiaBij A^aBoj ~b AziBzj) 
= ay — {AyBu + A^jB^A + A 3^/434). 


A 44 = 3.94418 
= 0.22516. 

Bij = x44j/A,44. 


[(L17904)(.114942) + (1.99391)(1.78768) 

+ (.464424) (.040968)] 


This completes the forward solution. If the experimenter desires to 
compute only the {hi] and the over-all reduction in sum of sciuares due to 
regression (SSR,) Avithout the individual s^{hi) and s{bihj), he can make 
these computations AAdthout determining the inverse matrix, asfolloAvs: 

VI. 64 = By = -1.3786. 

hz = Bzy - B-uhA = 1.64007 + (.040968) (-1.3786) = 1.5836. 

52 = B2u - B2,hz - B2AbA = -77.9610 - (-.079475)(1.5836) 

- (l.78768)(-1.3786) 

= -75.3706. 

5i = By^ — Biobz — B izbz — B-\4h4 = 3.08696 

- (.0037026) (-75.3706) - (.669906) (1.5836) 

- (.114942) (-1.3786) = 2.4636. 

SSPv = ZbiSx4j = (2.4636) (31.6650) + • • • 

+ (-1.3786)(-152.8797) = 6,907.74, 
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If we let !)'■ be the estimate of the last h in the Doolittle procedure when 
i fixed variates are used, we note that B,,, = b'-; in other words, we note 
that Biy will give the value of 5,: if this is the last h to be computed. This 
is called a completely adjusted 6 , since Xi has been adjusted for all the 
other a:’s. In our case with r = 4, h[ = But if only the first three 

:r’s were used, we would omit the column and the .44 and B 4 rows, so 
that the regression coefficient for x^ would then equal B^y. This would 
be 63 . We also note that Aiy == Sxiy and, proceeding as in Exercise 
14.6, A 21 J = S{x 2 adjusted for xi)tj, A^y = >S(x 3 adjusted for Xi and X‘^y, 
and A Ay = aSXx 4 adjusted for :ri, X2, and X:\)y. Hence, using the same 
method as in Exe'.rcise 14.6, we can consider the regression equation asf 

y ~ b[xi + h2{x2 adjusted for Xi) + b^(x3 adjusted for Xi and Xo) 

+ 64 ( 0:4 adjusted for 0 : 1 , 0 : 2 , and x-y) + e. 

Hence we can write 
4 4 

A ryb^- — ^ A iyBiy 
i=l i=l 

= (31.6650) (3.08696) + • • • + (-.3104) (-1.3786) = 6,907.74. 

The above results indicate that a decided saving in computation can 
be made by use of the abbreviated Doolittle method if the experimenter 
decides to omit the last k fixed variates from the regression model; in 
this case, the computations for the first (r — k) variates need not be 
changed. Of course the experimenter seldom knows which fixed variates 
might be omitted before he starts the computations; hence, he docs not 
know which should be put last in the computiTig scheme. However, 
in many cases, there are some fixed variates which he would like to omit 
because they are costl}^ or difficult to measure. In such a case he might 
put these k fixed variates last and omit them if they contributed little 
extra to SSR. For example, the added contribution of Xz and Xa above 
to SSR is only (A^yBzy + AAyBAy) = 30.91, with 2 degrees of freedom. 

Despite the obvious errors of rounding in computing the 6 ,- by the 
abbreviated Doolittle procedure, SSR is only .02 less than the value 
computed after the use of the Hotelling iterative procedure on the Doo¬ 
little solutions. 

If the computer needs the inverse (C) matrix, the computations pro¬ 
ceed as given for the so-called ^‘backward’’ solution. 

VII. First compute CiA values. 
caa = 1A444 = 1/.22516 = 4.44129, 

C 34 = -caaBza = -(4.44129) (-.040968) = .181951, 

t ?>4 is often written as which reads the regression coefheient for Xa adjusted for 

.ri, X 2 , and x^. would be 63.12, and 63 = 62.1, while h[ is an unadjusted 61 . 


SSR = y 
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C24 = -C 34 B 23 - ca,Bu = (.181951)(.079475) - (4.44129)(1.78768) 

= -7.92514, 

Ci4 = —C24^12 ^ 34/^13 C445i4 = —.603037. 

Check by use of ai 4 Ci 4 + * * * + < 244^44 = .99997. 

VIII. Ciz values. 

C 43 = C 34 = .181951, 

C 33 = I /.433 - C 34534 = .0882136 - (.181951)(-.040968) = .0956668, 
C 23 = -033^23 - ^34^24 = (.0956668)(.079475) - (.181951)(1.78768) 

= -.317667, 

Ci 3 = — C 23 B 12 — — 034^14 = —.0838254. 

Again check by use of auCu + • • • + a 43 C 43 = 1.0000008. 

IX. Ci 2 values. 

C24 = -7.92514, C 32 = C 23 = -.317667, 

I/A22 “ C2ZB2Z — C24B2A 

.8965715 - (-.317667)(-.079475) + (7.92514)(1.78768) 

15.03894, 

— C 22 B 12 — C 2 zB\z — C 24 -S 14 = 1.06806, and 

Ci 2 < 2 i 2 "b * * * “h <^42^42 = .99993. 

X. Cii = 1 /A 11 — C 12 B 12 CuBu — C 14 B 14 

= .0974880 - (1.06806)(.0037026) + (.0838254)(.669906) 

+ (.603037) (.114942) = .219003, and 

Ciiflii "b ■ ‘ ‘ ^41^41 = 1.0000002. 

XI. The solutions for the hi, SS14, and s^(5,) proceed as with the 
Doolittle method, 

= [2.46334 -75.3693 1.58356 -1.37975], SSR = 6,907.79. 

These h’s can be checked against those computed in (VI) above. 

Again we note that some of the hi are not very accurate but that SSR 
is off only slightly in the last decimal place. The Hotelling iterative pro¬ 
cedure can be used to improve this C matrix also. Rounding errors 
seem to be of more importance with the abbreviated method; hence, it 
is especially advisable to carry extra places at first in order to secure the 
desired accuracy at the end without having to use the Hotelling iterative 
device to improve the accuracy. If it is known in advance that the C 
matrix will be needed, an identity matrix can be carried on the right of 
the abbreviated Doolittle matrix as with the Doolittle matrix and the 
same computations used there as in steps I to VI; the computing proce¬ 
dure is described on page 191 of reference 4. 
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16.4. Analysis of the Results. The experimenter will generally look 
at the simple regressions of Y on each of the fixed variates first, in order 
to obtain some idea of the usefulness of each of these fixed vaiiates. 
This does not mean that a fixed variate will be omitted if its simple regres¬ 
sion coefficient was not significantly different from zero; there may be 
good theoretical reasons for adjusting all other fixed variates foi one 
which, of itself, was not too important as a predictor. But if the experi¬ 
menter had first run the simple regression of Y on X 2 , he would have 
discovered a highly significant relationship; yet the t value was not sig¬ 
nificant when all four X’s were used. And he would be even more per¬ 
plexed when he studied the following analysis of variance for all four 
fixed variates (based on the Doolittle solution): 


Source of variation 

Degrees 

of 

freedom 

Sum of squares 

Mean square 

Regression (4 variates) 

4 

6,907.76 

1,726.94 

Regression (A'o only) 

1 

6,759.98 

6,759.98 

Added reduction due to Xi, Xu, X 4 

3 

147.78 

49.26 

Error 

22 

2,242.77 

101.94 


From this analysis of variance, we conclude that the over-all reduction 
due to the use of the regression equation is highly significant 



= 16.94 
101.94 . 


and that the added reduction due to the three fixed variates other than 
X 2 is decidedly nonsignificant. Hence we are led to conclude that X 2 
is a very important predictor and that the other predictors add nothing 
to the reliability of the estimate of vitamin B 2 content. 

Why then is 62 not highly significant when all four X’s are used in the 
model ? The answer lies in the peculiar nature of X 4 , which is the product 
of Xi and X 2 , causing X 4 and X 2 to be highly correlated. Hence 1)2 and 
64 are also highly correlated, so that the actual influence of X 2 on F is 
split into a part contributed by 1)2 and another part contributed by 
It is impossible to interpret 62 as the change in F when X 2 varies while 
holding the other X^ constant, because X 4 will vary when X 2 varies. 
The change in ? when X 2 varies is given by 

z = ~ 1)2 biXi. 

oJi 2 


The average value of this change is 

I = = -75.3747 - 1.3765(1.8578) = -77.9320. 
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This average change is almost the same as £' 2 , the average change in Y for 
a unit change in X 2 , neglecting all other fixed variates. The estimated 
variance of this average change, z, is 

= s^(&2) + (X.ys^ib,) + 2Xis(b,h,) 

= [15.04063 + 4.44178(1.8578)2 - 2(1.8578)(7.92605)](101.94) 

= (.92105)(101.94) = 93.893, 


and the standard error is 9.6899, so that 


^ 77.93 
^ 9.69 


8.04, 


a highly significant value. We have used (2) and (3) (page 196), in 
this computing. 

This example was selected because it illustrates some of the difficulties 
of interpreting regression analyses when some of the fixed variates are 
closely related to one another. If two fixed variates are highly corre¬ 
lated, it is unrealistic to assume that one can be held constant while the 
other varies. A multiple regression coefficient can be interpreted only 
as the average change in Y for a unit change in Xi when the other X^s 
are not changed. Hence to interpret two highly correlated regression 
coefficients, we should like to know the relationship between the A^’s in 
order to study the real change in F when one of the X’s changes. This is 
not to say that the use of a fixed variate like X 4 (= XiXo) is undesirable. 
On the contrary, it is often quite desirable to be able to say how ,2 differs 
for various values of Xi. For example, if Xi were temperature and Xo 
were rainfall, then it would be highly desirable to know how the effect 
of 1 inch of rainfall varied for different temperatures, Xi. We know that 
for most crops high rainfall may be detrimental at low temperatures but 
quite valuable at high temperatures. Hence a knowledge of the regres¬ 
sion of yield on temperature and rainfall alone would be rather useless 
unless the cross-product term were also included (see reference 8 for 
an example of such a study). In some analyses, it may be impractical to 
consider that any of the fixed variates can be held constant while some 
other one varies. In this case the regression equation should be con¬ 
sidered as a whole. One example of this is given in Exercise 15.4. 

The estimated variance of the average value of F for a fixed set of 
X’s, {X'!, can be computed directly from the {Cij} and the value of s^. 

r 

Given f' = P + ^ - Xi). 

i=l 




r 

2 Ciix'i^ + 2 c.jX'iX'i 

i= 1 i<j 


+ 


ij = 1 
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where x'- = X[ — Xi. Using the values of the Cij, found by the use of 
the Hotelling iterative procedure (page 19G), we have for our vitamin 
Bo example 

,s 2 (r) =: 101.94^-,- + .<(.219012a^; + I.O 68 I 8 O.U 2 - .083828a:'3 

- .603104a;0 + x^(1.068180x', + 15.0400264 - .3177044 
-7.9260494) + 083828a;; - .3177044 + .095668.4 

+ .1819714) + x'(-.603104xi - 7.9260494 + .181971.4 

+ 4.4417774)] 

Similarly the estimated variance of a single predicted value of V is given 
by 

s^(r) + 

Confidence limits can be assigned to the various estimates as follows: 

— taS(bi) <. < hi + taS(bi), 

f' - <„«(?') < < f' + u(?'), 

P - < Fj IV! < ?' + Vs=(?') + s', 

where 

(P(|^| > Q = «. 


EXERCISES 


15.1. Show how the sum of squart^s for the regression of Y on Xi and 
X 2 alone could be obtained in Sec. 15.4. 

16.2. Select some data in a familiar field of application with one 
dependent and at least three fixed variates, and carry out the calculations 
leading to the estimates of the regression coefficients and their standard 
errors. Investigate the usefulness of the various fixed variates, and indi¬ 
cate whether or not any should be discarded. 

15.3. (a) Show that if X 3 is omitted from the regression equation for 
the vitamin B 2 example, the new regression coefficients (5'') and c values 
(c") are 


6 " = hi -A c,i, 
C 33 


CziCz} 

C 33 


(See reference 2 .) 

(5) See reference 9 for the adjustments needed if one extra fixed variate 
is used. 

15.4. A cost study was made of 89 dairy farms in 1941.^® The depend¬ 
ent variate was the amount of milk sold (F) with the following fixed 
variates: amount of concentrates (Xi), amount of silage (X 2 ), pasture 
cost (X 3 ), and amount of roughage (A" 4 ). The means and sums of squares 
and cross products adjusted for the means are given below, with the 
amounts in thousand pounds per cow and the pasture cost in $10 per cow. 




Xi 

X2 

Xz 

Xi 

y 

Xi 

50.5154 

-66.1617 

-4.84289 

-.937732 

36.7974 

X2 


967.1077 

13.5895 

32.4425 

39.0556 

X 3 



12.5457 

-12.5195 

7.02815 

Xi 




192.3053 

9.99432 

y 





113.5872 

Mean 

2.94310 

3.90647 

1.16426 

3.60326 

5.73994 


(а) Compute the regression coefficients and their standard errors 
Use the abbreviated Doolittle method, and complete the backward solu¬ 
tion also. 

( б ) Show that all regression coefficients except are significant at the 
1 per cent probability level. 

(c) Use the ij column in the forward solution to determine the error 
variance when Xi is removed from the prediction equation. 

(d) In 1941 the cost per thousand pounds was $18 for concentrates, 
$2.70 for silage, and $8.50 for roughage and milk sold for 3.2 cents per 
pound. Estimate the profit or loss and its standard error if 3,200 pounds 
of concentrates, 4,000 pounds of silage, 3,000 pounds of roughage, and 
$15 worth of pasture were used per cow. 

15.6. J. T. Wakeley^^ has compiled some soils weather data and data 
on the vitamin content of turnip greens, contributed by tlie Georgia 
Agricultural Plxperiment Station for the Southern Cooperative Group. 
The dependent variates are milligrams of ascorbic acid [Fi] and micro¬ 
grams of riboflavin [F 2 ], each per 100 milligrams of dry weight. The 
fixed variates are soil moisture tension (atmospheres 10 ) [Xi] and mean 
temperature (degrees Fahrenheit -i- 100 ) fX 2 ], each at 8 inches depth; total 
radiation in gram calories per square centimeter per minute -i- 1,000 [Xz] 
and evaporation in centimeters [XJ, each for the previous 48 hours; and 
number of days since planting 100 [X 5 ]. 

The means and sums of squares and cross products are: 



Xi 

Xz 

X 3 

Xi 

Xz 

Mean 

Xi 

6.6510 

2.6722 

4.4957 

2.6338 

.14360 

.5475 

X2 


4.8750 

8.3609 

4.3597 

-3.2533 

2.7156 

Xz 



19.8174 

9.5315 

-6,6767 

1.4353 

Xi 




5.2394 

-3.1452 

.5810 

Xz 





3.7799 

1.0881 

yi 

-.60160 

-1.5771 

-1.6953 

-.81984 

.94173 

2.8497 

y2 

-3.2590 

7.1768 

10.9862 

5.9285 

-7.1415 

6.1834 


yi 

y2 


Syl = 1.5025, SijI = 28.6900, 


n = 32. 
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Since the problem of computing a regression with five fixed variates is 
quite a long one, the student may wish to select particular variates as an 
exercise. 

16.6. Woltz eL alA^ and Foster^^ analyzed 25 samples of tobacco leaf 
for organic and inorganic chemical constituents, and multiple regression 
was used to discover the nature and extent of the relationship of certain 
of these constituents. The dependent variates considered were rate of 
cigarette burn in inches per 1,000 seconds (Fi), per cent sugar in the leaf 
(F2), and per cent nicotine (F3). The fixed variates were percentages of 
total nitrogen (Xi), of chlorine (X2), of potassium (X3), of phosphorus 
(X4), of calcium (Xr,), and of magnesium (X^). The original data (25 
observations) and corrected sums of squares and products are given in the 
accompanying tables. 


Data of Exercise 15.6 



Fi 

Y , 

F3 

Xi 

X2 

X3 

X4 

X5 

X6 


1.55 

20.05 

1.38 

2.02 

2.90 

2.17 

.51 

3.47 

.91 


1.63 

12.58 

2.64 

2.62 

2.78 

1.72 

.50 

4.57 

1.25 


1.66 

18.56 

1.56 

2.08 

2.68 

2.40 

.43 

3.52 

.82 


1.52 

18.56 

2.22 

2.20 

3.17 

2.06 

.52 

3.69 

.97 


1.70 

14.02 

2.85 

2.38 

2.52 

2.18 

.42 

4.01 

1.12 


1.68 

15.64 

1 .24 

2.03 

2.56 

2.57 

.44 

2.79 

.82 


1.78 

14.52 

2.86 

2.87 

2.67 

2.64 

.50 

3.92 

1.06 


1.57 

18.52 

2.18 

1.88 

2.58 

2.22 

.49 

3.58 

1.01 


1.60 

17.84 

1.65 

1,93 

2.26 

2.15 

.56 

3.57 

.92 


1.52 

13.38 

3.28 

2.57 

1.74 

1.64 

.51 

4.38 

1.22 


1.68 

17.55 

1.56 

1.95 

2.15 

2.48 

.48 

3.28 

.81 


1.74 

17.97 

2.00 

2.03 

2.00 

2.38 

.50 

3.31 

.98 


1.93 

14.66 

2.88 

2.50 

2.07 

2.32 

.48 

3.72 

1.04 


1.77 

17.31 

1.36 

1.72 

2.24 

2.25 

.52 

3.10 

.78 


1.94 

14.32 

2.66 

2.53 

1.74 

2.64 

.50 

3.48 

.93 


1.83 

15.05 

2.43 

1.90 

1.46 

1.97 

.46 

3.48 

.90 


2.09 

15.47 

2.42 

2.18 

.74 

2.46 

.48 

3.16 

.86 


1.72 

16.85 

2.16 

2.16 

2.84 

2.36 

.49 

3.68 

.95 


1.49 

17.42 

2.12 

2.14 

3.30 

2.04 

.48 

3.28 

1.06 


1.52 

18.55 

1.87 

1.98 

2.90 

2.16 

.48 

3.56 

.84 


1.64 

18.74 

2.10 

1.89 

2.82 

2.04 

.53 

3.56 

1.02 


1.40 

14.79 

2.21 

2.07 

2.79 

2.15 

.52 

3.49 

1.04 


1.78 

18.86 

2.00 

2.08 

3.14 

2.60 

.50 

3.30 

.80 


1.93 

15.62 

2.26 

2.21 

2.81 

2.18 

.44 

4.16 

.92 


1.53 

18.56 

2.14 

2.00 

3.16 

2.22 

.51 

3.73 

1.07 

Sum 

42.20 

415.39 

54.03 

53.92 

62.02 

56.00 

12.25 

89.79 

24.10 

Sx ^ or Sy ^ 

.6690 

101.4644 

6.5921 

1.8311 

8.8102 

1.5818 

.0258 

3.7248 

.3828 
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Sums of Cross Products 



X2 

3:3 

Xi 

Xu 

Xq 

Vi 

1/2 

?/3 

Xi 

-.3589 

-.0125 

~ .0244 

1.6379 

.5057 

.2501 

-9.6105 

2.6691 

Xt 


-.3469 

.0352 

.7920 

.2173 

-1.5136 

12.8511 

-2.0617 

Xz 



-.0415 

-1.4278 

- .4753 

.5007 

2,4054 

-.9503 

Xi 




.0043 

.0154 

- .0421 

.3945 

- .0187 

Xh 





.9120 

1 -.1914 

-9.3692 

3.4020 

x& 






- .1586 

-3.2733 

1.1663 


{a) Analyze the regression of per cent nicotine (Fs) on the percentages 
of total nitrogen (A^i) and potassium (X3). 

(h) Analyze the regression of cigarette burn (Fi) on X2, X^, and X4. 
What would be the effect of omitting A'^s? 

(c) Analyze the regression of per cent sugar (F2) on 1, X2, X5, and Xe. 
What would be the effect of omitting Xu and Xe? 
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CHAPTER 16 

CURVILINEAR REGRESSION: ORTHOGONAL POLYNOMIALS 

16.1. Introduction. If it is desired to fit a regression equation using 
successive powers of one or more fixed variates, the methods given in 
Chap. 15 can be applied. For example, suppose we had a series of annual 
rainfall data (F) for the years 1900 to 1949 and wished to determine a 
regression line such as the following polynomial: 

F == « + diX + + (3sX^ + /34X^ + e, 

where X = 1, 2, . . . , 50. In order to use the methods of Chap. 15, 
merely set X" = X^. Snedecor^ presents some examples using this 
technique of determining a polynomial regression line. 

If the fixed variate is equally spaced, such as in time or space, a con¬ 
venient method of curve fitting by orthogonal polynomials can be used.f 
This method was first developed by Tchebysheff- and has since been 
extended by many writers. Some of the books and articles on this sub¬ 
ject are references 3 to 8. The advantage of orthogonal polynomials over 
the usual regression variates arises from the fact that the former are so 
constructed that any term of the polynomial is independent of any other 
term. This property of independence permits one to compute each 
regression coefficient independently of the others and also facilitates test¬ 
ing the significance of each coefficient. A summation method is illus¬ 
trated by Fisher^ and Snedecor,^ but the polynomial method presented 
by Fisher and Yateswith tables for X through 75 is generally more 
expeditious if tests of significance of the succeeding terms of the poly¬ 
nomial are desired. Another method described by Aitken^ is also recom¬ 
mended if tables are available. 

We shall consider the use of the polynomials, for which a computer's 
bulletin was prepared by Anderson and Houseman, who also included an 
extension of the tables through X — 104.^^ An example is given in 
this bulletin for 62 annual United States sugar prices, 1875 to 1936. 

16.2. Determination of the Polynomial. Suppose we wanted to fit 
a polynomial of high enough degree to n equally spaced points so that 

t The method of orthogonal polynomials can be used for unequally spaced X’s, but 
it is not so advantageous for such data. 
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succeeding terms would not reduce the residual sum of squares by a 
significant amount, for example, 

Yj = /5o + (3iXj + ^2X) + ^^x} + • • • 4- iSrX^j + ej, 

where Xj = Xj — X — j — j. Since j = (n + l)/2, we see that xj takes 
on the values 

-(n - l)/2, -(n - 3)/2, . . . , (n - 3)/2, (n - l)/2. 

Let us attempt to replace the above equation by one of the form 

Yj = ao + aiPij + a2P2j + * * * + OirPrj + 

where the as are functions of the /3\s and the P’s are functions of the 
powers of x (fiiosen so that each P^ is a function of all powers of xj (=j — J) 
equal to or less than i: 

Pij == Cio + CiiXj + Ci2xf 4" * • * + CiiXj. 

Let us construct a set of r polynomials, that is, determine the coeffi¬ 
cients, C’s, so that each polynomial is orthogonal to all others including 
Pq = 1 . That is, 

SPijPkj = 0, i = 0, Ij . . . y r; k 9^ i. 

3 

In the future, S will refer to summation over j. For a given Pij, we want 
the i sums for k = 0, 1, . . . , ^ — 1 to be zero. Since each polynomial 
is a function of the powers of x equal to or less than k, we can replace the 
above summations by 

SPijXj =0, /c = 0, 1, 2, . . . , e — L 

3 

That is, we have i equations to determine the values of the {i 1) coeffi¬ 
cients: Cid, Cii, . . . , Cii. Hence it is necessary to fix the value of one 
constant; we shall set Cu = 1. 

Using the definition of P^ in terms of the x’s, we have for a given k 


S{CiQ 4- CiiXj 4- • * • 4- Ci , i - ix :^ f ^ 4- xf)x^ = Y CuSxf^^ 4- = 0. 

We know that = 0 when m is odd. Hence we have the following 
equations for i odd: 


t We shall use the letter P instead of ^ in the theory which follows. 
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zuy 


h 


Equation 

0 

C.-.0W + 

4- 

• • • + Ci,i-iSx^^ = 0 

2 

Ci,,Sx‘^ 4- C,,2Sx^ 

+ 

• • • 4- Ci,i^iSx^+^ = 0 

^ — 1 

Ci,,Sx^-^ 4- Ci,2Sx/+ 

^ + 

• • • 4- = 0 

1 

Ci.iSx^ + Ci.sSx^ + 


+ + Sx'+‘ = 0 

3 

Ci.iSx^ + + 


+ Ci.i„2Sa;*+i + Sx‘+^ = 0 

i - 2 

Ci,,Sx‘-' + Ct.sSx*+^ + 

. . . 

4- Ci,i-2Sx^^-^ 4- = 0 


The -j- 1) equations for k even involve only the coefficients {Ci,Q,Ci, 2 , 
. . . and since these equations have no constant term, all the 

coefficients are zero. But the — 1) equations for k odd do have non¬ 
zero constant terms, and we can solve for the coefficients 

Hence we conclude that when i is odd (t = 1,3, . . .)> 

Pi = X, 

Pz = + Cs.lX, 


Pi = Ci,i-2x'‘ ^ + • • • + Ci,zX^ 4" Ci,iX. 

Similarly it can be shown that when i is even {i = 0,2,4, . . .)> 

Po = 1, 

P2 = X^ C2,0, 

P 4 = + 04 , 2 :^^^ + ^4,0, 


Pi ~ X'‘ Ci,i-2X‘^ ^ ‘ + Ci,2X^ -j- C,-,o. 

Note that we have omitted the j subscript for convenience. 

From these results we see that 

xPi = Pi+i + Di,iPi-i + Di,2Pi~z + ‘ * j 

where the P’s are functions of the C’s. For example 
xPi = Pi - C 2 . 0 , 

xPi = 4 - C 2 ,qx = Ps 4- (C 2 ,o — Cz,i)x, 

Dll = — p 2 ,o, Dll — p 2 ,o “ p3,ij Dij — 0 for / >• 1. 

Since (xPi) is a function of powers of x equal to or less than (^ + 1), the 
orthogonality requirement, SPix^ = 0 for I > k, guarantees that 

SxPiPi = 0 

for I > (f 4- 1) and because of the symmetry of i and I iov i > (? 4- 1). 









Also since P} is a function of even powers of x, SxPj — 0. Hence the only 
cases for which SxPiPi ^ 0 are I i ± 1. And when I = i ± 1, all of 
the terms except those for P.+j and Pi_i vanish. That is 

^^a:P,P,+i = SPl, or SxPi^iP^ = SPj, 

SxPiPi^i = D,,,SPl,. 

Hence 

^.,1 

and all other D’s = 0. 

Therefore we now have a recursion formula to determine higher-order 
polynomials in terms of lower-order ones: 


P,+i = xP, - P,_i. 

Since Pi — x, Pq = I, SP\ — Sx^ — 7i(n^ — 1)/12, and SPl = n, 


P, = x‘^ - 


- 1 


Similarly 


SPl ^ Sx^ - n 


rP — _ n(7p — l)(nr — 4) 

) ~ Tsd 


Pa = 


- 1 , n2 - 4\ , 3n2 - 7 

— _ _ —U _ I 'y* — _ 


-- \ X = X'^ - 


A general formula for SP} is 

SP2 = n(n^ - l)(n^ - 4) - • (,,2 _ _ i)|]2 

4(2f-f- l)[(2f - 1)!]^ 


Hence 


SP! _ (n2 - P)P 
SPti 4(4P-1)' 


A proof of these results is beyond the scope of this book but can be found 
in most of the sources cited in Sec. 16.1. Hence 


Pa = 


3n^ - 7 ^2 _ 9)9 l 9(?r2 - l)(n2 - 9) 


[20 ' 140 J ‘ 

3n2 - 13 3(?i2 - l)(n2 - 9) 

14 ^ 560 


(12)(140) 


The reader will note that these polynomial values, P^y, will be fractions. 
For convenience of computing and of presentation of tables of the 
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polynomial values, it is desirable to use only integral values for the Pij. 
The regular polynomial values have been multiplied by a constant, X^, 
in references 10 and 11 , so chosen that the values of P'^j = X^P ij are 
integers reduced to lowest terms. Hence 

Yj = + Oi[P[j + a'Jp 2 j + • * * + ot'^P'rj + ^3) 

where ai — a^Xi and ao = olq. 

Example 16.1. As an example, consider the construction of the P.y 
values for n = 5 (refer to Example 6.4 on page 02). Obviously, for five 
points, a fourth-degree polynomial Avill fit the data exactly; hence, only 
P 2 . P;j, and Pi exist. Using the formulas already indicated, 

we have Xj — j— = i ~ 3, and 


Pi; = Xi 


~ X 


2 _ 


2 , 


v3_ IJL 


L 7^ P — __ I 7^ 

^ *U? ^^3 U 7 U ' ^5' 


The polynomial values are given in the table below. We note that 


j 

Pu 

i 

P-^i 

P:M 

P^i 

P'h 

p' 

1 

-2 

2 

6 

- 5 

1 2 
TTX 

-1 

1 

2 

-1 

-1 


4 8 

— 

2 

-4 

3 

0 

-2 

0 

7 2 
TfS" 

0 

6 

4 

1 

-1 

.L2 

li 

4 8 
~ 8 5 

-2 

-4 

5 

2 

2 

e 

5 

1 2 
-5T^ 

1 

1 


10 

14 



w 

70 


Py = p'^ and P^j = P 2 /; that is, Xi = X 2 = 1 . X 3 = | and X 4 = ft; so 
thatP'aj = (5P3i)/0 andPJj- = (35P4j)/12. 

16.3. Estimating the Parameters, a • and <j‘^. We are given the regres¬ 
sion equation to fit to n equally spaced points: 

r 

Yi = ^ a’X'ii + + ej> r + l< n, 

i = 0 

where the {e} are assumed NID(0,o-^) and 

fj = A'o + A[P[j + • • • + a:p'j, 

with a'i = AJ. The least-squares equations to determine the A' are 
quite simple, because of the orthogonal property of the P'’s. In fact 
Aq = Y, and 

V == 




The polynomial tables in reference 11 present values of the PJ, \i, and 
for values of 7i through 104. In these tables, P'- = 

The sum of squares of deviations from the regression is given by 

SSR = nf2 + A[S{YP[) + A',S{YP',) + • • • + .4;.S'(FP;), 

where each term is the independent reduction due to the successive 
polynomials: mean, P[, PJ, • • • , P'. Also 

r 

SSE = SF* - SSR = Sy^ - ^ SSR,-, 

1 = 1 

where 

SSR, = A'S(YP') = iSYP'y/S(P'i)\ 


= SSE/(n — r — 1) is an unbiased estimate of Hence w^e can set 
up the folknving analysis of variance: 


Source of variation 

Degrees of freedom 

Mean scpiare 

Linear regression {P[) 

1 

SSRi = {SYP[y/S{F[Y 

Quadratic regression (Pg) 

1 

. 

SSRo = {SYP[Y/S{P'^y 

' 

,.th degree regression (P^) 

1 

SSRr = hSFP,p/S(P;)2 

Error 

n — r — 1 

.s2 = SSE/(n - r - 1) 


In general we do not know wRat degree (r) should be used; so tests of 
successive terms are made until there is no material reduction in by 
the use of additional terms, t 

It should be pointed out that the equation 

? = Aj + a;p; + ■ • • + a:p: 

can be used for prediction purposes. Hoivever, if the original polynomial 
form of the equation is needed, the P- ivill have to be replaced by their 
equivalent polynomial function, 

P' - X,(C,o + Cnx + + • ♦ • + 

This is adequately explained in reference II. 

Example 16.2. In order to compare the number of spears per asparagus 
plant for male and female plants, the California Agricultural Experiment 
Station at Davis obtained the following data on the differences (F) 

t This procedure introduces a slight bias into the estimate of because the proce¬ 
dure is unbiased only if the model is postulated in advance (r is known in advance). 
We doubt that this bias is very serious, but some theoretical or empirical studies of 
this point should be made. 
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Zit5 


Year 

O') 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Sum of 
squares 

Y 

1.1 

7.1 

11.0 

12.6 

14.7 

19.9 

25.1 

23.9 

23.1 

23.6 

26.0 

24.6 

4,516.43 

P'l 

-11 

-9 

-7 

-5 

-3 

-1 

1 

3 

5 

7 

9 

11 

572 

P'v 

55 

25 

1 

-17 

-29 

-35 

-35 

-29 

-17 

1 

25 

55 

12,012 

P\ 

-33 

3 

21 

25 

19 

7 

-7 

-19 

-25 

-21 

-3 

33 

5,148 

p': 

33 

-27 

-33 

-13 

12 

28 

28 

12 

-13 

-33 

-27 

33 

8,008 

^ ■ -w . 


^26 

•^24 

I 

22 

•ts 

I 20 


18 


2 16 


14 


12 


10 


5 




• Actual Difference 
>« Estimated Difference 


-^X 


1925 ’26 ’27 ’28 ’29 ’30 ’31 ’32 ’33 ’34 ’35 ’36 

Fig. 16.1. Changes in differences between mean number of male and female asparagus 
spears, 1925 to 1936. 


between the mean number of spears per plant for each of the years 
1925 to 1936, here indicated as years 1 to 12 (see Fig. 16.1). The first 
four sets of polynomial values are given below the yields. The X s are 
Xi = 2, X 2 = 3, X 3 = I, X 4 = -iV The computed results are given below. 


SY = 212.7, 
SP[Y = 602.1, 
SP'.,Y = -1,025.7 
SPiY = -19.5, 
SPiY = 71.7. 


a; = 17.725, 
a; = 1 .05262, 

A' = -.0853896, 
a; = -.00378788, 
a; = .00895355. 


= 3,770.11, 
SSRi = 633.78, 
SSR 2 = 87.58, 
SSR 3 = .07, 

SSR 4 = .64. 



The analysis of variance is as follows: 


Source of variation 

Degrees of 
freedom 

Sum of squares 

Mean square 

Sy^ 

11 

746.32 


Linear regression {P[) 

1 

633.78 

633.78 

Deviations from linear 

10 

112.54 

11.25 

Quadratic regression (Pj) 

1 

87.58 

87.58 

Deviations from quadratic*. 

9 

24.96 

2.77 

Cubic regression (P 3 ) 

1 

.07 

.07 

Deviations from cubic 

8 

24.89 

3.11 

Cubic and quartic regression 

2 

■ .71 

.36 

Deviations from quartic 

7 

24.25 

3.46 


On the basis of these results, we conclude that the regression is quad¬ 
ratic and that the best estimate of is 2.77 with 9 degrees of freedom. 
It should be indicated that successive terms of the polynomial are tested 
by use of the deviations from regression. For example, we test for the 
existence of linear regression by use of 

F' - (633.78)/(11.25) = 56.34 

with (1,10) degrees of freedom. This is not an exact test, because we find 
out later that the denominator is an overestimate of cr'b However, the 
test is certainly on the safe side; that is, if F' is significant, the F by use 
of the best estimate of will certainly be significant. Hence we can use 
F' as a guide to the advisability of considering any form of regression. 
It is always advisable to compute at least one more polynomial value 
after the first nonsignificant one. That is, we might have stopped with 
Pg, but it is advisable to try out P\ to make sure it is not significant. 
Using the quadratic regression equation, 

f = 17.725 + 1.053P; - .0854P'2 

- 17.725 + 2.106(y - 6.5) - .0854[3(i - 6.5)^ - 35.75] 

= 17.725 + 2.106(i - 6.5) - .0854(3f - 39i + 91) 

= -3.735 + 5.437; - .2562/ 

c= (1.45, 6.11, 10.27, 13.91, 17.04, 19.66, 21.77, 23.36, 24.45, 25.02, 

25.07, 24.61). 

These results are displayed graphically in Fig. 16.1. 




EXERCISES 

16.1. (a) Prove that A' is an unbiased estimate of a', (r^(A^) = (rV>S(PJ)2, 
and cF{A'^A'l) — 0 for i 7 ^ 1. 

(h) Prove that is an unbiased estimate of cr^. 

(c) Prove that SSK is independent of SSE. 

16.2. Derive the formula for P5. 

16.3. Compute the polynomial values, P'-, f = 0,1, • • * , 5, forn = 6. 

16.4. (a) Determine the standard errors of A\ and A^ in Example 16.2, 
and then set up 95 per cent confidence limits for the true regression 
coefficients and 0:2. 

(h) Do the same for Ai and A2 as estimates of ai and 0:2. 

(c) What is the standard error of Yj? 

16.6. Given the following data on the output (7i) in pounds per man¬ 
hour for General Motors employees and the aggregate weight of all 
General Motors automobile products in millions of pounds (F2) for 
the years 1929 to 1941. 


Year 


1929 1930 1931 1932 1933 1934 1935 193G 1937 1938 1939 1940 1941 


Fi 

72 


12.55 13.41 14.79 14.56 15.05 15.67 17.60 18..30 18.61 19.69 19.87 20.81 21.83 
5,632 3,537 3,175 1,692 2,477 3,812 5,338 6,617 6,868 4,020 5,618 7,492 8,631 


(a) Use a set of orthogonal polynomial tables (reference 10 or 11), or 
construct such a set for n = 13 to determine the best fitting polynomial 
to fit to each set of data. 

(h) If the same degree equation is to be used for both sets of data, 
what degree would you advise? 

(c) How would you determine the regression of F2 on Fi, both adjusted 
for time trends? What theoretical difficulties do you see in the deter¬ 
mination of the regression of F2 on Fi? 

16.6. Snedecor’ presents the following problem on the average weights 
of sunflowers: 


Week 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Height 

18 

36 

68 

98 

131 

170 

206 

228 

247 

250 

254 

254 


(а) What degree of polynomial should be used to estimate the height 
after j weeks? 

(б) Plot the actual and estimated heights. 

(c) Determine the standard errors of the regression coefficients. 

16.7. We know that the graph of the equation 
F = 10 - 2X + 3X2 _ X3 






passes through the points 

(1,10), (2,10), (3,4), (4,-14), (5,-50), (6,-110). 

Use these points to derive the prediction equation 

? = A'„ + A\P[ + A'r, + /t'P', 

and show that F = F in this case. 

16.8. The student is advised to select some data from his own field of 
research for which a polynomial prediction equation is useful and to 
carry out the necessary computations to determine F. 

16.9. (a) Use the general definition of orthogonal polynomials, 

SPijXj =0, i k, 

3 

to derive a complete set of orthogonal polynomial values for the following 
X’s: 2, 3, 6, 9. 

(6) When would it be useful to derive a set of orthogonal polynomial 
values for unequally spaced Z’s? 
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CHAPTER 17 

LEAST SQUARES FOR EXPERIMENTAL DESIGN MODELS 

17.1. Introduction. The least-squares methods outlined in the preced¬ 
ing chapters (tan be used to analyze the effects of different treatments 
in a planned experiment. We shall consider a variety of experimental 
results in succeeding chapters, among these being the following: 

(i) The yields of a crop for different varieties and different fertilizers. 

(ii) The gains of weight of animals for different rations. 

(iii) The effects of different temperatures on the plating of steel vdre. 

(iv) The effects of varying periods of cold storage on the tenderness and 
flavor of beef. 

(v) The toxicity of various chemicals on insects. 

In the discussion which follows, the word treatments will be used to 
apply to the varieties, fertilizers, rations, or chemicals used in the experi¬ 
ment. And the word yield will be used to describe the result ofAhe 
experiment, whether it be bushels of oats, pounds of grain, or per cent 
of insects killed by a spray. It is further assumed that the yields were 
obtained by applying the treatments to a variety of experimental units, 
such as plots of ground, animals, or pieces of wire. 

When the experiment is set up, some restrictions may be placed on the 
assignment of the treatments to the experimental units. These restric¬ 
tions are included in the general topic of experimental design and are 
extensively discussed in references 1 to 4. Four important types of 
designs will be discussed in the next two chapters. 

(i) Completely randomized designs. No restrictions are placed on the 
assignment of the treatments to the experimental units, except that each 
treatment shall be used a specified number of times. 

(ii) Randomized complete-blocks designs. The experimental units are 
divided into relatively homogeneous sets, called blocks, the number of 
experimental units per block being equal to the number of treatments 
and each treatment assigned to one experimental unit in each block.^ 

(iii) Latin-squore designs. The experimental units are arranged in 
homogeneous blocks in two directions, called rows and columns, each 
treatment appearing in each row and column once and only once. 

(iv) Incomplete-blocks designs. The number of treatments is greater 
than the number of experimental units which can be grouped in each 
block. 
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LEAST-SQUARES ANALYSIS 


The first three designs, which can be classified as complete-blocks designs, 
will be discussed in Chap. 18 and the incomplete-blocks designs in 
Chap. 19. 

Then in Chap. 20 different combinations of treatments will be dis¬ 
cussed as factorial designs. These factorial designs can be used with any 
of the above field designs and are discussed in detail in references 1, 2, 
and 5. 

Some attention will also be paid to the analysis of data collected in 
sample surveys. The basic data are collected from sampling units, such 
as people or plots of ground; after the sampling units have been con¬ 
tacted, they may be subdivided into subclasses according to some of the 
information secured in the survey. Or the population may be divided 
into classes before the survey is started and a fixed number of sampling 
units contacted in each class. Since most surveys are set up with 
several sources of random variation, the}^ will be discussed in detail in 
Chap. 22. A few examples are included in Sec. 18.2, where the sampling 
design corresponds to a completely randomized experimental design. 
In the discussion which follows we shall refer only to the analysis of 
experimental results, but the reader should understand that survey 
results can be analyzed by similar methods if the assumptions apply. 

When an experimenter sets up an experiment to examine the yielding 
ability of different treatments, he usually desires the following infor¬ 
mation : 

(i) A test of whether there are any real treatment effects. 

(ii) Average yield or effect of each treatment or the difference in yields 
between two different treatments. 

(iii) Confidence limits for these average yields or differences. 

(iv) A measure of the precision of the experiment. 

(v) An estimate of the efficiency of the particular experimental design 
as compared with some alternative design. 

The experimenter should have some ideas concerning the pattern of 
response of the experimental material to the treatments. In order to 
use the method of least squares to analyze the data, we must assume that 
the yield for a given experimental unit can be represented as a linear 
function of the treatment effect, block effect, and error for that particular 
experimental unit, that is. 

Yield = (general mean) + (treatment effect) + (block effect) + (error), 

where the mean measures the average yield for this experimental setup, 
the treatment and block effects add or subtract from the mean depending 
on how they affect the yield, and the error measures the failure of these 
effects to predict the exact yield. The error is a measure of the uncon- 
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trolled sources of variation in the experiment, such as differences in the 
yielding abilities of experimental units in the same block, and errors in 
applying the treatments to the experimental units and in collecting 
the data. 

If the errors in the above experimental model can be assumed to be npn- 
correlated and to have the same amount of variability for every treatment 
and block, the method of least squares will give unbiased minimum vari¬ 
ance estimates of the effects. In addition, if the errors are normally 
distributed, the F and t tests can be used to test for the significance of 
treatment differences, and the usual confidence limits for means can be 
used. A more detailed discussion of these assumptions is presented in 
the next section. 

As indicated in previous chapters, a computational device which 
furnishes the requisite data for estimating variances and making tests 
of significance for the method of least squares is the analysis of variance. 
The analysis of variance is a simple arithmetic device for dividing the 
total sum of squares into separate orthogonal parts, and if the least- 
squares assumptions are met, these sums of squares can be used to make 
tests and to set up confidence limits. 

In Chap. 14 it was shown that if we had r fixed variates split into two 
groups of k and (r — k) variates each, the prediction equation was 


Y 


M + 


K r 

^ ^iXi + ^ 


l3iXi + €. 




The analysis of variance was: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean square 

First k fixed variates 

k 

Rk 

Rk/k 

Added reduction by last (r — k) fixed 
variates 

r — k 

Rr - Rk 

(Rr - Rk)/(r - k) 

Error 

n — r — 1 

SSE 

s2 = SSE/(n - r - 1) 


We can test the null hypothesis that the last (r — k) /5’s are 0, without 
assuming anything about the first k /3’s, by use of = (Rr — Rk)/(r — k)s‘^, 
with (r — k) and {n ~ r ~ 1) degrees of freedom. We generally wish 
to eliminate the effect of the mean from the regression first and do not 
want to test if the mean is 0. Hence everything has been indicated as 
deviations from the mean with y estimated by Y. When it is desired to 
test the null hypothesis that any subset of (r — k) of the /3/s are all zero, 
we would determine the reduction, Rr, due to all r fixed variates and then 





the reduction Rjc due to the other k fixed variates alone. The difference 
between the two reductions furnishes the necessary data to test the 
desired hypothesis by use of F, as indicated above. If the two sets of 
fixed variates happen to be orthogonal, then the first k fixed variates 
can be tested by use of = Rk/ks^. This merely means that the added 
reduction for these k fixed variates after fitting the last (r — k) fixed 
variates alone is the same as Rk or, conversely, that the reduction due to 
fitting the last (r — k) fixed variates alone is the same as Rr — Rk. 

In a planned experiment, the first set of fixed variates could represent 
block effects and the second set treatment effects. As will be explained 
in more detail in later sections, the X’s are either 1 or 0 for experimental 
data and the regression coefficients are the effects. Many experiments 
are designed so that the block and treatment effects are orthogonal; in 
other words, each part of the analysis of variance can be used to test a 
particular set of effects. When the treatment and block effects are 
orthogonal, the computations are simplified and the estimates of the 
effects are generally more efficient. However, there are circumstances 
for which orthogonality is not possible or advisable. Some of these 
are discussed in Chap. 19. 

The reader is advised to read reference 6 for a discussion of the general 
theory of tests of significance for analysis-of-variance problems and 
reference 7 for the power of these significance tests. In the chapters 
which follow, it will be shown that the expected value of the reduction 
in the total sum of squares attributable to certain treatments used in the 
experiments can be written as 

V 

(p - iw + r rf, 

where the p jr, ) are the true treatment effects, each treatment being used 
r times in the experiment, and is the population variance. Tang^ has 
prepared tables of the Type II error (1 — power) in terms of the treat¬ 
ment and error degrees of freedom and a constant (j>, where 

4 > — V^^Srf/per^. 

17.2. Assumptions Made in the Experimental Model. Before using 
the analysis of variance to summarize the results of an experiment, it is 
advisable to check the reasonableness of the assumptions which are set 
up. A discussion of these assumptions, the consequences of their not 
being true, and some methods of handling aberrant data are presented 
in the March, 1947, issue of Biometrics.^ A brief summary of these 
assumptions is presented here, but the reader might prefer to reread this 
section after going over the next two chapters. 
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(i) In order to connect the analysis of variance with the theory of linear 
regression, we must assum,e that the various fixed effects and the error are 
additive. If the physical make-up of the data or of the operations pro¬ 
ducing the data are of such nature that the effects are really not additive, 
then the sum of squares attributable to such effects bj^ the analysis of 
variance do not represent the true effects. If nonadditive effects are 
actually multiplicative, then we could use the logarithms of the original 
data and have additive effects. Some other type of transformation 
might be useful for other kinds of nonadditivity. Except for the multi¬ 
plicative case, it is usually assumed that the additive assumption is a 
good first approximation to the true relationship and that refinements 
here would produce only minor improvements in the analysis. In 
general, if the treatment and block effects are small, for example, the 
largest mean is not more than 50 per cent greater than the smallest, it is 
doubtful that one needs to worry about nonadditivity. But if there are 
large treatment or block effects, caution should be exercised in the use 
of an additive linear model. With large treatment differences, these 
differences may not be the same from one block to another. This is called 
an interaction between treatments and blocks; if this interaction is also 
additive, a design may be set up to measure it (see Chap. 23). For 
some recent ideas on this subject, see reference 9. 

(ii) In addition, the errors must he noncorrelaled. In field experiments, 
it is known that the errors in adjacent plots arc usually positively corre¬ 
lated. A device, called randomization, is used to circumvent much of 
this difficulty. By this device, the treatments are allocated at random 
to the experimental units, subject to the design restrictions mentioned 
in the previous section, so that there is an equal chaTice of any two treat¬ 
ments appearing in adjacent and in nonadjacent plots. Hence the 
expected value of the total error for any one treatment is independent 
of that for any other treatment. It should be emphasized that random¬ 
ization does not remove the correlation between the inherent character¬ 
istics of adjacent plots; rather it provides a mechanism by which this 
expected correlation between two treatments tends to cancel as^ the 
number of experimental units per treatment is increased. A more 
complete treatment of this subject is given by R. A. Fisher,^ Cochran,^ 
and Yates. Cochran and Cox^ (page 8) make the following remark con¬ 
cerning randomization: ^^Randomization is somewhat analogous to 
insurance, in that it is a precaution against disturbances that may or may 
not occur and that may or may not be serious if they do occur. It is 
generally advisable to take the trouble to randomize even when it is not 
expected that there will be any serious bias from failure to randomize.’^ 

Such data as daily price and production figures cannot fulfill the 
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randomization requirement. Hence it is generally stated that the 
analysis of variance is not a valid method of testing for yearly or monthly 
price differences. To date, the mathematical difficulties of testing for the 
existence of correlations between errors have prevented any definite 
statements on the validity of the analysis of variance for such data. 
Research in the field of serial correlation may lead to an approximate 
solution of this problem. The serial correlation coefficient measures the 
correlation between successive members of a series. See reference 11 for 
some recent articles on this subject. 

(iii) In order to have a sim'ple analysis of variancey it is desirable that 
the errors be the same from one experimental unit to another, regardless of the 
treatment used. Also, if the errors are not equal, it is necessary to know 
the relative sizes of the variances before the experiment is conducted, a 
knowledge which is generally not available. Sometimes the data can be 
split into parts, each part with homogeneous errors, but with unequal 
errors from one part to another. An example of this is presented in 
Chap. 3 of reference 1 and by Cochran.^ For example, it often happens 
that some experimental units are subjected to a control treatment, which 
often is no treatment at all. If the experimental units are quite variable 
and the treatment is effective in raising the yield, there may be much 
more variability between control yields than between yields from treated 
plots, because the treatments Avill tend to increase the yields of the poor 
plots more than the good plots (hence decreasing the variability). 

Bartlett® discusses various transformations of the original data to 
stabilize the variance when there is a fixed relationship between the mean 
and the variance. For example, we have shown that if a sample is 
drawn from a Poisson population, the mean and the variance will tend to 
be the same. In this case, it can be shown that if the dependent variable 
Z = a/F is used in the analysis, the variance of Z will be approximately 
the same for all levels of Z; if there are many small values of F, use 
Z = -\/F + i. Many social, economic, and production data tend to 
have the standard deviation increase with the mean; the variance of the 
transformed variate Z = log F will tend to be approximately stable. 
However, it should be apparent in all of these cases that if the treatment 
and block differences are small, there is seldom any need for a 
transformation. 

(iv) If it is desired to use the analysis of variance to make the usual tests 
of significance or set up confidence limits, the errors must be normally 
distributed. The principal nonnormal feature to be concerned about is 
skewness. But even this probably does not affect two-tailed tests too 
much. However, there may be serious biases for single-tailed tests or 
confidence limits. Some investigations have been made on the effects 
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of nonnormality on the significance levels and power of tests (see Coch¬ 
ran^), but few general results have been obtained. Perhaps more 
empirical sampling studies should be made. 

17.3. Principles of Setting Up an Experiment. While the statistician 
and experimenter are deciding on a realistic model which can be analyzed 
by available techniques (including transformations, if necessary), the 
details of how the experiment is to be conducted and analyzed should also 
be outlined. 

(i) The experimenter should clearly set forth his objectives before proceeding 
with the experiment. Is this a preliminary experiment to determine the 
future course of experimentation, or is it intended to furnish answers to 
immediate questions? Are the results to be carried into practical use at 
once, or are they to be used to explain aspects of theory not adequately 
understood before? Are you mainly interested in estimates or in tests of 
significance? Over what range of experimental conditions do you wish to 
extend your results? 

(ii) The experiment should be described in detail. The treatments should 
be clearly defined. Is it necessary to use a control treatment in order to 
make comparisons with past results? The size of the experiment should 
be determined. If insufficient funds are available to conduct an experi¬ 
ment from which useful results can be obtained, the experiment should 
not be started. And above all, the necessary material to conduct the 
experiment should be available. 

(iii) An outline of the analysis should be drawn up before the experiment 
is started. 

17.4. Methods of Increasing the Accuracy of the Experiment. Accu¬ 
racy refers to the success of estimating the true value of a quantity; it is 
often confused with precision, which refers to the clustering of sample 
values about their own average, which will not be the true value if this 
average is biased. Precision can be thought of as the inverse of variance. 
Hence accuracy is a more inclusive term, since it involves both biasediiess 
and precision. Often the experimenter has to choose between an unbiased 
estimate with rather low precision (high variance) and a slightly biased 
one with high precision. The choice of the proper estimate is often 
dictated by circumstances beyond his control, but certain methods of 
increasing the accuracy of the experiment should be kept in mind. If 
there is no bias in the experimental and estimation procedures, accuracy 
and precision are for all practical purposes synonymous. 

(i) Accuracy can generally be increased by increasing the size of the 
experiment. There are certain limitations to this statement, such 
as the fact that increasing size may bring in more heterogeneous 
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material and also may result in a poorer supervision of the experi¬ 
ment with a possible biased result. The latter point is often true 
in industrial and psychological experiments and in sample surveys. 
But in general the accuracy of an estimate increases with increas¬ 
ing size of the experiment. Increasing the number of blocks or 
treatments also furnishes more degrees of freedom for the estimate 
of the experimental error. 

(ii) The experimental techniques should he refined as much as possible, 

(a) There should be a uniform method of apjdying the treatments 
to the experimental units. 

(h) Sufficient control over external influences should be exercised 
so that every treatment operates under as nearly the same 
conditions as possible. 

(c) Unbiased measures of the treatment efl'ects should be devised 
so that they are fully understood by those running the experi¬ 
ment and by other research workers. As yet we do not have 
good enough measures of such things as socioeconomic status, 
educational progress, and economic conditions to enable us to 
compare adequately the results of one experiment or sample 
with another. 

(d) As far as possible, checks should be set up to avoid gross errors 
in recording and analyzing the data. 

(hi) The experimental m.aterial should be selected to suit the experiment. 

(a) The size and shape of the experimental unit should be pre¬ 
pared to achieve maximum accuracy and unbiasedness. 

(b) Often additional measurements can be taken to help explain 
the final results (see Chap. 21 on covariance techniques). 

(c) Finally the treatments should be grouped together in the best 
manner. In other words, the proper selection of the experi¬ 
mental design is of the utmost importance. If there are too 
many treatments or the experimental units are quite hetero¬ 
geneous, an incomplete-blocks design might be used. If 
interactions are important, some type of factorial design is 
needed; if higher-order interactions are not important, some 
system of confounding might be used (see Sec. 20.6). The 
problem of whether to balance the treatments or not must be 
decided on the basis of the importance of various comparisons 
and the amount of experimental material available. 

Anyone who has attended lectures on experimental designs by Prof. 
Gertrude Cox will recognize the origin of many of the above remarks, 
which also have been included in reference 1. 
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CHAPTP]R 18 

THE ANALYSIS OF DESIGNS IN COMPLETE BLOCKS 
18.1. Steps in the Analysis. In the analysis of experimental data, 

we shall generally consider the following steps: r n 

(i) Construct a regression model which expresses symbo ica y e 

nature of the variables used in the experiment and the method of con¬ 
ducting the experiment. . 

(ii) Determine estimates of the parameters m the regression model by 

least squares. . , . 

(iii) Indicate the reduction in the total variation due to the regression 

in an analysis of variance table. 

(iv) Compute and the necessary F values. 

v) Compute variances of the estimates in (ii) and confidence limits. 

(vi) If possible, compare the error variance using this design with the 
expected variance if some other design had been used—this is called the 

eficiency of the design. _ 

(vii) Present actual examples with a discussion of the results. 

18.2. Completely Randomized Design, (i) Suppose we have p di - 
ferent treatments which are to be tested for yield-producing ability Let 
us assume that each treatment is planted on r different experimental units 
which we shall call plots, r is often called the number of repheahons for 
each treatment. Then we might estimate the yield for the lih. treatment 
on the jth. plot of its r plots as 


9 

Yij ~ M ^ TkXki + 


where 




0 ioY k ^ I 


= 1, 2, . . . , p;i = 1, 2, . . . , r,* 


'\l for /c = iI 
Ti is the differential effect of the ith treatment (over the mean, m) and 
TkXki = 0. Some restriction such as the last one is necessary, since 

there will be only p treatment totals to e^stimate (p + 1) 

the V r’s). If the restriction mentioned above {^^TkXki - U} is use , m 

will be the over-all mean effect. 
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Since the X’s are 0 or 1, the regression model can be simplified to 

Ytj = M + + €ij, 

where ^ = 0. As mentioned in Chap. 17, it is assumed that the aj are 

i 

NID(0,cr^) and independent of the Hence the expected mean yield for 
the ^’th treatment is m This assumes that there is no interaction 

between the treatment and the error, that the treatment and error are 
truly additive; otherwise, the difference between Y and ju + r,: on a 
particular plot would not be the same for all treatments. The assumption 
that all € have the same variance, enables us to estimate cr^ even though 
we have but one observation per plot. Actually it is assumed that the 
possible errors in repeated sampling at one plot are iV(0,o-2), and we would 
like to estimate by repeated sampling at this plot. But it is impossible 
in most experiments to do this (and still maintain the same other condi¬ 
tions) ; hence, we estimate o-- from the plot-to-plot variation of a single 
experiment. In order to do this, we must assume all e’s have the same 

One method of distinguishing between tlie r’s and the e^s has been to 
call the t’s fixed effects and e^s random, effects. By fixed effects, we imply 
that all of the treatments about which inferences are to be made are 
included in the experiment. A raiidorn effect is assumed to be one of a 
large number of possible effects; in general, we shall regard the number of 
possible effects to be infinite. However, it is not too difficult to set up the 
theory corresponding to the assumption that the number of possible ran¬ 
dom effects is a known finite number (greater than the number in the 
sample). In the model used here, the e’s represent the differences in the 
yielding abilities of the various plots, plus errors made in applying the 
treatments to the plots, in measuring the size of the plots, and even in 
measuring the actual yields produced. If the es represented only the 
differences in the yielding ability of the plots, one might believe that the 
number of e’s would be fixed; however, the other components of the e’s are 
certainly unlimited in number. But we do not feel that even the yielding 
ability of a given plot is fixed, because this will change with time and will 
be influenced at a given time by particular environmental conditions 
which affect that plot and not the others, such as weather for crops and 
social and psychological factors with people. Hence, on all counts, it 
seems reasonable to regard the plot errors (e) as random variables or 
variates. 

(ii) We wish to estimate the values of /x, {r.^, and cr^ from an experi¬ 
ment which has produced r yields for each of the p treatments, f The 
method of experimentation is to select rp plots in the field and then to 

t Or we may consider a survey with p classes and r samples from each class. 
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allocate p treatments at random on these plots, with the restriction that 
each treatment should appear on r plots, f Let us designate the estimates 
of fi and Ti as and U. Hence the regression equation can be set up in 
the form 

Yij W “f" ti 6ijj 

where the residual Cij is the estimate of m and the {^4 are to be 
chosen so as to minimize Se^. 

The normal equations are 

m \ rpm + = SY = G 

ti'.rm + rti = SY^ = Ti, i = 1,2, . . . , p, 

3 

where Ti is the total yield for the ^th treatment and G the grand total for 
the experiment and where the m equation is found by differentiating Se^^ 
with respect to m, and similarly for U. If the t;. equations are added 
together, this sum is exactly the m equation. This shows that the 
p-treatment equations are not independent of the equation for m. In 
order to solve these equations, an auxiliary relationship must be used. 
Since = 0, a reasonable relationship is 

^ii - 0, 

so that each U measures a differential effect from the mean. Also, this 
makes m — G/rp = Y, the sample mean. And 

ti = {Ti/r) - m = {Ti - f )/r, 

where f — G/p. If the average yield for the fth treatment is desired, we 
calculate 

t'i — ti 'fn = Ti/r. 

A simple method of setting up the normal equations is to write down 
the estimated value for each observation and then obtain the estimating 
equation for each effect by adding together all observation equations 
which contain this effect. If the experiment is designed so that the 
various sets of effects are orthogonal, the resultant equation will be a 

t The randomization feature is needed to assure that each treatment has the same 
chance of appearing on a given plot. As pointed out in Sec. 17.2, randomization also 
helps to validate the assumption of noncorrelated errors. The errors for adjacent 
plots are often positively correlated. However, when we consider all possible ran¬ 
domizations, any two e^s will apply equally often to adjacent and to distant plots. 
Hence, if the errors tend to be as negatively correlated for distant plots as they are 
positively correlated for adjacent plots (a rather reasonable assumption in many 
experiments), the average correlation between these two e’s should be approximately 
zero. And regardless of the plot-to-plot correlation pattern, randomization should 
tend to minimize the effect of this correlation on the errors in the model. 
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function of only the effect under consideration (plus the mean); otherwise 
there will be a mixture of effects, necessitating the simultaneous solution 
of several equations to estimate the effects (as was done in Chap. 15). 
For our completely randomized design, the observation equations 
involving h are; 

Yield Effects 

Yii w -j- b — q 

Yi 2 ?n -f- b = b 

Ylr W + b = t[ 

Ti == SYij r{m + b) = rt[ 


Hence is simply Ti/r. 

Since m appears in all of the observation equations, we must add all of 
them together to compute rn, giving us 

SY = rpm + r(2q) = rpm. 

Hence m = SY/rp = Y, assuming ^ U — 0. 

i 

(hi) The error sum of squares is 


SSE = >S(F,y - m - tiY = S 

ij ij 

V 

= SV!, - ‘A = «S2/S 

ij ' ij 


S[y,-Y) 

^ ZTf _ 


where C — G^^/rp. Hence the reduction in sum of squares due to treat¬ 
ments, called the treatment sum of squares (and indicated by SST), is 

yrp2 

SST = ^ - C. 


Another method of deriving this result is to use the abbreviated 
Doolittle results. If we eliminate m from the U equation, we have 

rU = T,- G/p = Ti - f. 

Hence the Aiy value is Ti — f, and the Bi^, value is (75 — f)/r. There¬ 


fore SST 


2 {Ti - fy/r = (S77/>-) - C. This simple 


result is obtained because the ith equation contains b and no other t. 


lyumrjuji/1 xv-JL>zvi./L//vo to 


X 


Expressed in terms of the original model, 

(rpM + J X 

r.- - f - (rM + m ^ 6,,)-- 


= r(Ti - r) 


^ ^ ^ y = i ^ iT^i j 


and 


E(7\ - T) = r(r, - f). 

But it has been shown that ^ r,- = pr == 0 is a necessary part of the 

i 

original model. Therefore 

E[{Ti - f)=l = rV| + [(^ p ’)' + Q’ Kp - 1)] 


.2,2 KP - 1) 


= rVf + 


and 


V 

iXSST) = r ^ T? + (p - 1) a\ 

i=l 


The analysis-of-variance table is: 


Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean square 

E(MS)f 

Treatments 

P - 1 

SST 

MST = SST/(p - 1) 

+ »■ ^ r^i/(p - 1) 

Error 

p(r - 1) 

SSE 

§2 = SSE/p(r - 1) 

i 

0-2 


t .E'(MS) = expected value of the mean square. 


(iv) From Chap. 14, w^e know that SSE is distributed as with 
n — p — rp ~ p ~ p{r — 1) degrees of freedom. If we set 

s2 = SSE/p(r - 1), 

E{s^) — 0-2. Hence we can use E = MST/sHo test I/o* {n- = 0} against 
the alternative that some 0. The one-tailed F test is used since the 
numerator of F is expected to be greater than the denominator when any 

Ti 9^ 0. 

(v) Since == T^•/r = ^rju + tti + ^ and m — G/rp — \x I, 

it is seen that 

E{Q = ju + r^, Eiin) ■=■- jj,, 










Also for any other treatment t[ = Ti/r, and 

^{hh) ~ 

The estimated difference between the mean effects of these two treat¬ 
ments is 

d = ti - ti = t'i - tl 

E{d) = 8 = Ti — Ti. 

Since and t'l are noncorrelated, cr^id) = 2<T^/r and the confidence limits 
for 5 are 

d — taS ■\/2/r < 8 < d + taS \/2/r. 

Example 18.1. As a first example, we might consider an experiment 
which was run to investigate the number of warp skein breaks on tent 
twill in 5 consecutive days of testing with 4 breaks per testA The 
results are; 


Day 

Total 






1 

2 

3 

4 

5 


30 

30 

40 

45 

55 


35 

40 

45 

40 

45 


25 

45 

55 

35 

60 


40 

45 

35 

40 

50 


Total 130 

160 

175 

160 

210 

835 

Mean 32.5 

40.0 

43.75 

40.0 

52.5 

41.75 


Analysis of Variance 


Source of 

Degrees of 

Sum of 

Mean 

variation 

freedom 

squares 

square 

Days 

4 

845.00 

211.25 

Error 

15 

668.75 

44.58 


The error variance is estimated to be 44.58. There are significant differ¬ 
ences among the day means, since F = 211.25/44.58 = 4.74 with 4 and 
15 degrees of freedom. Also, the standard error of the difference between 
any two day means is given by s{d) = V'2(44.58)/4 = 4.72. Hence 
the 95 per cent confidence limits for the difference between the mean 
number of breaks for the first two days are 

7.5 - 10.1 < 5 < 7.5 + 10.1 or -2.6 < 5 < 17.6, 

where 10.1 = ts{d) = (2.13) (4.72), t with 15 degrees of freedom. This 
example will be discussed further in Exercise 18.2. (See Fig, 18.1 for a 
graphical presentation.) 
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Fig. 18.1. Number of warp skein breaks in 5 consecutive days of testing. 

Example 18,2. A second example presents an analysis of the gains in 
weight (grams per 100 days) of rats on a stock ration with various 
amounts of gossypol added (Halverson and Sherwood, North Carolina 
State College, 1932), as shown in Table 18.1, 

Table 18.1 


Amounts of gossypol added 



None 

.04% 

.07% 

.10% 

- iotai 

.13% 


228 

186 

179 

130 

154 


229 

229 

193 

87 

130 


218 

220 

183 

135 

130 


216 

208 

180 

116 

118 


224 

228 

143 

118 

118 


208 

198 

204 

165 

104 


235 

222 

114 

151 

112 


229 

273 

188 

59 

134 


233 

216 

178 

126 

98 


219 

198 

134 

64 

100 


224 

213 

208 

78 

104 


220 


196 

94 



232 



150 



200 



160 



208 



122 



232 



110 






178 


No. of rats 

16 

11 


17 

11 1 67 

Sum 

3,555 

2,391 

2,100 

2,043 

1,302 11,391 

Mean 

222.2 

217.4 

175. 

1 120.2 

118.4 170.0 





There is an added complication in the analysis of these data, because 
the number of rats varied from treatment to treatment. However, the 
analysis of variance proceeds quite simply, as follows: 


c = = 1,936,640. 

67 

Sums of Squares 

Total: (228)2 + (229)2 + . . . + (104)^ - C 


78,985. 


Rations: 


(3,555)2 (2,391)2 (2,100)' 

”16 11 ” 12 


(2,043)2 ^ (1,302)2 
17 11 

= 140,083. 


Source of 

1 Dcsr{;os of Sum of 

Mean 

variation 

freedom squares 

square 

Rations 

4 140,083 

35,020.75 

Error 

G2 38,902 

627.45 


In this experiment, = 627.45, F = 55.8, a highly significant value 
{a < .01). See Exercise 18.3 for more details. 

18.3. Randomized Complete Blocks, (i) Suppose that the rp plots 
are divided into r blocks of p plots each and that each treatment is 
assigned at random to one of these plots in each block. Then the 
estimation equation is 


m + U + hj + Cij, 


where is the differential effect of the jth block (j = 1 , 2 ,... ,r) being 

estimated by hj, and ^ dy = 9- In this case we assume that the block 

J 

effects arc also fixed—that inferences are to be made for these particular 
p treatments applied only to these r blocks. Since most experimentation 
is set up to make inferences over a wider variety of experimental condi¬ 
tions than we have in the particular blocks used in a given experiment, the 
blocks usually are regarded as representative of a population Avhich is 
larger than the sample. The analysis of data under these assumptions is 
not too different from that presented in this section; this problem will be 
discussed in Chap. 23. It is further assumed that there is no real treat¬ 
ment by block interaction. A discussion of the problem when there is 
interaction (that is, treatment effects are not constant from block to 
block) also will be presented in Chap. 23. (See the footnote on page 
229 for a discussion of the need for randomization.) The equations 
for a randomized complete-blocks experiment Avith 3 treatments and 4 
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blocks are indicated in Fig. 18.2. The treatments (1,2,3) were assigned 
to the plots at random, the random arrangement being (1,3,2), (3,2,1), 
(2,1,3), and (3,1,2). 


Block 1 Block 2 


y=m + 6 i 

Y“ m + &i 

Y=m + bi 


Y= m + 62 

Y = m + 62 

Y=m+b2 

+ q + e^i 

+ <3+^31 

+ q + 621 


+ ^32 

+ q + ^22 

+ q + 642 


Block 3 



Block 4 


Y^m+b‘^ 

Y=:m + 63 

Y= m +63 


Y=m + b^ 

Y=m+bi 

Y=m + 64 

+ q +^ 2 ;; 

+ q + e]3 

+ ^ 3+633 


+ q + «34 

+ q + 65^4 

+ q + 634 


Fig. 18.2. An example of the model equations for a randomized complete-blocks 
design (r == 4, p == 3). 

(ii) The least-squares equations are 

m: rpm + rS/^ + pXhj — G, 
ti'. r(m + ii) + = T^, 

hj: p(m + hj) + 2^,- = Bj, 

where Bj is the total yield for the^'th block and the hj equation is found by 
minimizing Se^ with respect to bj. h\ this case, two auxiliary equations 
are required; in order to make m = Y = G/rp, an unbiased estimate of 
we set 'Lti = T^hj = 0. Hence ti = (T,- — f)/r, and bj = (Bj — B)/p, 
where T = G/p and B — Gjr. We let 

4- = k + w = Tilr\ h] = bj + m = Bj/p. 

(hi) Using the abbreviated Doolittle approach, we see that the t 
equations are the same as in Sec. 18.2 (iii) and that 

pbj = Bj - B. 

Hence the reduction in sum of squares due to blocks and treatments is 



r p 


Therefore the added reduction due to blocks alone (indicated by SSB) 
must be 

J (B. - BY 

SSB = - 


V 
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We note that 


rU = Ti - f = 
pbj = Bj — B = 



2 2 



Hence E(ti) — Ti and E{bj) — /5j. Furthermore, it can be shown that 
o-[(T'i — f)(Bj — B)] = 0. This also proves that the treatment and 
block effects are noncorrelated; hence, SST and SSB are orthogonal parts 
of the total reduction (the total reduction equals SST plus SSB). 

The analysis of variance table is: 


Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

E(MS) 

Blocks 

r — 1 

SSB 

MSB 

0-2 -P pS^f/(r - 1) 

Treatments 

P - 1 

SST 

MST 

1 + r^Fi/(p - 1) 

Error 

(r - l)(p - 1) 

SSE 


0-2 


(iv) = SSE/(r — l)(p — 1), and F = MST/s^ to test for over-all 
treatment differences. Note that SSE = Sy^ — SST — SSB. There is 
usually no reason to test for block differences, since the blocks are generally 
chosen to be different. 

(v) As in 18.2, = o-Vr, and a^(d) = 2(T‘^/r. 

(vi) The experimenter often is interested in estimating how much he 
reduced his estimated error variance by the imposition of the block 
restrictions on the design. In other words, how much would he have 
increased his estimate of the error variance if the treatments had been 
randomly distributed over all rp plots, instead of being randomly dis¬ 
tributed over the p plots of each of r blocks? Let us call si the error 
variance for the completely randomized design and s% that for the 
randomized complete-blocks design. Then the estimated efficiency of 
the latter design relative to the former is given by 

I = 

Since the estimated variance of the difference between two treatment 
means is 

s^{d) = 2s Vr, 

rl measures the number of replications needed for a completely random¬ 
ized design to give the same value of s^(d) as r replications in a randomized 
complete-blocks design. 
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Since we have only the analysis of variance for the randomized com¬ 
plete-blocks design, it is necessary to use these data to estimate sj. This 
estimate is as follows: 

2 (error d. f. + treatment d.f.) + SSB __ r( p — + SSB ^ 

error d.f. + treatment d.f. + block d.f. rp — 1 

where d.f. stands for ‘^degrees of freedom.’’ We shall derive this formula 
by considering a uniformity trial, in which all the treatments are alike. 
In this case si is the total sum of squares for a completely randomized 
design divided by (rp — 1). But since we are concerned with the same 
field as with our randomized complete-blocks design, SSB and SSE will 
remain unchanged (on the average for all possible randomizations). 
Since we are considering a uniformit}^ trial, the treatment sum of squares 
will be an estimate of (p — hence, the best estimate of this treatment 
sum of squares will be (p — l)s%. Adding these together and dividing 
by {rp — 1), we find that 

2 SSE + jp - l)s% + SSB ^ r(p - 1)4 + SSB 
(rp — 1) (rp — 1) 

We note that if there arc no block effects, E{sl) = however, if there 
are block effects, E{sl) > 

At first glance one might think that, since the only difference between 
the two designs is that the block effects are included in si and not in s%, 
the best estimate of sj would be (SSB -f SSE)/p(r — 1). However, 
since the total sum of squares is assumed the same for both designs, this 
would assume (falsely) that SST is jbhe same for both designs. Cochran 
and Cox'^ present a more rigorous proof, which avoids the use of the uni¬ 
formity device by estimating the average value of SST for both designs. 

Actually the efficiency is slightly less than /, because there is a loss in 
the number of error degrees of freedom when the block sum of squares is 
removed from the error. Fisher^ calculates the amount of information 
which d furnishes concerning 5, the true difference between two means. 
If cr^ were known, the amount of information would be proportional to ; 
with estimated by s^, the amount of information is (n + l)/{n -f 3)s^, 
Avhere there are n degrees of freedom for the error mean square. Hence if 
there are ni degrees of freedom for one design with error variance sf and n 2 
degrees of freedom for a second design with error variance §1? the efficiency 
of the first design relative to the second is 


{ni + l)(n 2 + 8)sl 
(ni + 3)(n2 + l)s5 



Since we have set s|/sf — 1,1 would be multiplied by 


T _ (^'1 + 1)(?'^2 + 3 ) 

(ni + 3 )( ri 2 + 1 ) 

to adjust for the loss in degrees of freedom. 

Cochran and Cox^ (pages 27 to 28) discuss several other methods of 
adjusting for the loss of degrees of freedom. One might consider yet 
another adjusting constant 

h' = ^^^ 2 ] 

FA{V - 1 ), ^ 1 ]' 

where there are (p — 1) degrees of freedom for treatments and 712 and rii 
respective degrees of freedom for error. If testing is done at the a — .05 
significance level, F.or, should be used. 

Example 18.3. This example presents the analysis of the differences 
between fat absorption by doughnuts in 8 different fats, all being tested 
on each of 6 days. 


Grams of Fat Absorbed by Mixes of 24 Doughnuts during Cooking Period 


Day (Block) 

Fat (treatment) number 

Total 

1 

2 

3 

4 

5 

6 

7 

8 

1 

164 

172 

177 

178 

163 

163 

150 

164 

1 ,331 

2 

177 

197 

184 

196 

177 

193 

179 

169 

1 ,472 

3 

168 

167 

187 

177 

144 

176 

146 

165 

1,320 

4 

156 

161 

169 

181 

165 

172 

141 

149 

1,294 

5 

172 

180 

179 

184 

166 

176 

169 

170 

1,396 

6 

195 

190 

197 

191 

178 

178 

183 

167 

1,479 

Total 

1,032 

1,067 

1,093 

1,107 

993 

1,058 

968 

974 

8,292 

Mean 

172.0 

177.8 

182.2 

184.5 

165.5 

176.3 

161.3 

162.3 

172.8 


Computations 


c = = 1,432,443, 

Sy^ = (164)2 -h (172)2 + • 


+ (167)2 - 1,432,443 = 9,143.00, 


SST = 


SSB = 


(1,032)2 + (1,067)2 


+ (974)2 


- 1,432,443 = 3,344.33, 


(1,331)2 + (1,472)2 + . . . + (1,479)2 

8 


1,432,443 = 3,986.75, 


SSE = Sy^ - SST - SSB = 1,811.92. 
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Analysis of Variance 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Between day means 

5 

3,986.75 

797.,35 

Between fat means 

1 ^ 

/ 

3,344.33 

■177.76** 

Error 

35 

1,811.92 

^ 51.77 = s2 


** Significant at 1 per cent probability level. 

We note that the efficiency of this design as compared with the com¬ 
pletely randomized design is 

_ 42(51.77) + 3,986.75 _ 6,161.09 _ 

^ “ ■47(51.77) ' 2,433.19 

In this case h =■■ (36)(43)/(38)(41) = .994, which is almost unity, h' is 
slightly less. This indicates that if the fats had been distributed ran¬ 
domly over the 6 days, the error variance would have been expected to be 
about 2.5 times as large as that with the randomized blocks design, where 
each fat was used each day. In other words, we have made our estimates 
of fat differences about 2-^ times as precise by planning the experiment so 
that every fat was used on each day of the experiment. 

The 95 per cent confidence limits for the difference between any two 
fat means are {d — 8.43 < 5 < d + 8.43). A word of caution about the 
use of these confidence limits should be given. These are average con¬ 
fidence limits, assuming that you select the treatments to be compared 
in advance of the experiment. If you wait until the experiment is over 
and then select, for example, the highest and lowest treatment means to 
compare, the confidence limits are much wider than [d ± /as(d)]. One 
excellent article on this point is presented by J. W. Tukey.^ R. A. 
Fisher^ (page 58) advocates the following approximate procedure for 
testing the largest against the smallest in a set of p means: Compute the 
significance probability by the ordinary t test, and multiply this by 
p(p — l)/2. p(p — l)/2 is the number of ways of drawing 2 from p 
means, and we are selecting the most extreme of these combinations; 
since the probability of drawing this extreme set is 2/p(p — 1), it seems 
reasonable to divide the probability of obtaining such a difference (by use 
of the t test) on the average by the probability of drawing the extreme 
set. 

18.4. Latin-square Designs. A design slightly more complicated 
than the randomized complete-blocks design is the Latin-square design. 
In this design, each treatment is assigned at random within a row and a 
column so that all treatments appear in each row and column. Hence it 
is possible to adjust the error for plot heterogeneity in two directions. Of 
course the rows may be fields and the columns similar locations in these 
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fields. The basic design looks as follows for 3 treatments (there then 
must be 3 rows and 3 columns): 



Cx 


Ca 

Total 


Ti(23) 

ToflT) 

T,i29) 

69 

R^ 

T2(16) 

Tz{25) 

Ti(16) 

57 

R, 

Tz{2A) 

Ti(18) 

^2(12) 

54 

Total 

63 

60 

57 

180 


The figures represent yields in an experiment. In the field the rows and 
columns of the basic design would have been randomized. One such field 
arrangement would be: 



Cx 

^2 

Cz 

Ri 

Tz 

Tx 

Ta 

R, 

To 

Tz 

Tx 

Ri 

T, 

T, 

Ta 


In general it is also advisable to assign the treatment numbers at random 
to the treatments. 

The regression model for this design is: 

Yijk = M + 7y + '’’kdJ) -f €i k, 

where ai and jj are fixed row and column effects and ta- the fixed treatment 
effect. We note that, once i and j are specified in a particular field 
arrangement, we know k. Hence is a function of i and j. For example, 
in the field arrangement mentioned above, if f = 2 and;/ = 3, k = 1. 

The analysis-of-variance table is similar to that of the randomized com¬ 
plete-blocks design except that there are two sets of blocks—rows and 
columns. The analysis of the 3X3 example is as follows: 


Source of 

Degrees of 

Sum of 

Mean 

variation 

freedom 

squares 

square 

Rows 

2 

42 

21 

Columns 

2 

6 

3 

Treatments 

2 

186 

93 

Error 

2 

6 

3 


The F value to test for treatment differences is ¥ = 31 with (2,2) degrees 
of freedom, which is significant at the a = .05 significance level. 

18.5. Summary. In order to help the experimenter decide what 
design to use, we are including a summary (Table 18.2) of some of the 
advantages and disadvantages of the three designs discussed in this 
chapter. 
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Table 18.2 


Feature of the 
experiment 

Completely 

randomized 

Randomized 
complete blocks 

Latin square 

Replications per treat- 

Need not be the 

Same for all treat- 

Same for all treat- 

ment 

same 

ments 

ments 

Number of treatments 

Unlimited, except 

If too many, may 

UsualR between 5 

(v) 

for plot variabil¬ 
ity 

lose advantage of 
blocks 

and 10, because 
r — p 

Ease of setting up ex¬ 
periment 

Easy 

Fairly easy, but 
must set up blocks 
of required size 

More difficult than 
other two 

Error degrees of free¬ 
dom 

Maximum number 

Lose blocks 

Lose both rows and 
columns 

High plot variability 
‘ 

Quite bad 

Takes care of varia¬ 
bility in one direc¬ 
tion 

Very good if varia¬ 
bility in two direc¬ 
tions 

Test of unequal error 
variance 

Easy 

Difficult, but often 
can pull out sets of 
error d.f. 

Veiy difficult 

Missing data 

No difficulty 

Not too bad, but 
some loss of effi¬ 
ciency 

Quite bad if several 
^ missing plots 

Effect on analysis of 
error in assignment 
of treatments to 
plots 

Minor 

May require more 
complicated analy¬ 
sis 

May be badly upset 

i 


jj 


EXERCISES 

18.1. Assume you have n samples for the fth treatment, (a) Show 
that you can obtain an estimate of from each treatment with (r^ — 1) 
degrees of freedom, (h) How can you use this result to test the assump¬ 
tion of equal variances? 

18.2. (a) In Example 18.1, use the method of orthogonal polynomials, 
outlined in Chap. 16, to divide the sum of squares for days (SSD) 
into four orthogonal parts, each with 1 degree of freedom (linear, 
quadratic, cubic, and quartic). For example, the polynomial values 
(P'l) for the linear jeffect are ( —2, —1,0,1,2). Consider the totals as 
the Y values, and show that SP[Y — 160. It has been proved that 
SSRi = (SP[Yy/S{P[y = 25 , 600/10 = 2 , 500 , but this is larger than 
SSD. Under the null hypothesis of no day differences, show that 
P(SSRi) = 4o-b Hence to put SSRi on the same basis as (whose 
expected value is o-^), we must divide SSRi by r = 4, so that the linear 
effect is 2,560/4 = 640. (See Fig. 18.1.) 

(h) Complete the analysis of the quadratic, cubic, and quartic effects. 
Is there a significant departure from linearity? Hence what do you con- 
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elude regarding the day-to-day fluctuations in the quality of the product? 

(c) Suppose we had used the means as the Y values. What changes 
must be made in the above analyses? 

(d) Can you make any general statement of how to analyze by orthog¬ 
onal polynomials p equally spaced sets of observations with r observa¬ 
tions at each point? 

18.3. (a) Set up the mathematical model for Example 18.2. Solve the 
normal equations, and show that the analysis presented there is correct. 
Show how the observation equations can be used to estimate the effect of 
the .04 per cent treatment. 

(b) Prove that the standard error of the difference between the effects 


of any 


two treatments 



where ri and f 2 are the numbers 


of rats on the two rations. Hence show that the standard error of the 
difference between the gains for rations no Gossypol and .04 per cent is 


9.81. 

(c) What is the standard error of the average of the first two against 
the third ration? The third against the last two? 

(d) How would you test for departure from linearity in this case? 
Graph these results. 

(e) Is there any evidence that the error variance is not the same for all 
treatments? 

18.4. An investigation of North Carolina farmers’ retail produce mar¬ 
kets was made in 1948.^ Data were collected on the dollar value of live¬ 
stock owned by a sample of sellers on these markets, shown in the accom¬ 
panying table. 


Data of Exercise 18.4 


Market 

Total 

1 2 

3 

4 

5 

6 

7 8 

9 

10 

11 

12 

13 

721 750 

3,480 

627 

469 

812 

393 369 

332 

249 

1,106 

1,703 

1,402 


64 756 

293 

169 

604 

271 

841 785 

842 

371 

1,702 

563 

1,088 


664 1,315 

370 

976 

165 

1,100 

426 655 


361 

1,154 

714 

273 


134 293 

284 

109 


305 

1,947 


1,061 

962 

908 

1,080 


610 865 

119 

704 





359 


97 

2,195 


546 3,274 

1,980 

1,557 





1,026 


442 

428 


278 


697 





477 



310 


29 










295 












178 


Total 3,046 7,253 

6,526 

4,839 

1,238 

2,488 

1,660 3,756 

1,174 

3,904 

4,924 

4,427 

7,309 

'52,544 

No. sellers 8 6 

6 

7 

3 

4 

3 4 

2 

7 

4 

6 

9 

69 

Mean 381 1,209 

1,088 

691 

413 

622 

553 939 

587 

558 

1,231 

738 

812 

761,5 
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(а) Analyze these data to see whether or not there are any real differ¬ 
ences among markets. 

(б) Compare the mean value per seller for the first nine markets with 
the mean value for markets 10 and 11. 

(c) Is there any evidence of unequal variation for these data? 

18.6. Snedecor and Cox^ analyzed some data on the gain in weight 
(grams) of 149 Wistar rats over a 6 weeks’ period for 4 successive genera¬ 
tions. The gains for the males and females in each of these generations 
were: 



Male 

Female 

Total 

Generation 

12 3 4 

12 3 4 

No. of rats 
Total gain 
Mean gain 

21 15 12 7 
3,716 2,122 1,868 1,197 
176.95 161.47 155.67 171.00 

27 25 23 19 
2,957 2,852 2,496 2,029 
109.52 114.08 108.52 106.79 

149 

19,537 

131.12 


ij 


The total sum of squares adjusted for the mean was Sy- — 176,836. 

(a) Set up the analysis of variance among and within the 8 classes and 
determine whether or not there are significant differences among the 8 
mean gains. 

(b) What is the standard error to test for the difference between the 
mean gain for the males of a given generation and the mean gain for the 
females of the same generation? What would you say regarding the 
differences between the population means for each of the 4 generations? 

(c) Can you make a statement regarding the difference between males 
and females over all 4 generations? 

18.6. The relationship between the vitamin B 2 content of turnip greens 
and the average soil moisture tension (Chap. 15) can be analyzed as a 
completely randomized design with p = 3 and r = 9. 

(а) Set up the analysis of variance of the vitamin B 2 content with the 
three soil moistures as treatments. 

(h) Test whether or not there was a significant departure from linearity 
in this case. 

(c) Is there any evidence that the error variance is not the same for the 
three treatments? 

18.7. Derive the observation equations for ti and hi, and show how 
they can be used to obtain the normal equations (Sec. 18.3). 

18.8. (a) Set up another randomization plan for Fig. 18.2, and write 
down the model equations. How many different randomizations can be 
devised for 4 blocks and 3 treatments? 

(б) Draw up a randomization plan for Example 18.3, 
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18.9. In the investigation of farmers’ markets (Exercise 18.4), the 
incomes of a sample of regular, former, and potential patrons were studied 
at 10 markets with the results given in the accompanying table. 


Data of Exercise 18.9 
Average Monthly Incomes ($10) 


City 

Regular 

Former 

Potential 

Total 

Asheville t 

30 

30 

28 

88 

Asheboro 

39 

36 

27 i 

102 

Charlotte t 

41 

37 

28 i 

106 

Durham f 

29 

29 

24 i 

82 

Greensboro f 

32 

27 

22 

81 

Raleigh t 

29 

27 

29 

85 

Goldsboro 

25 

24 

20 

69 

Rock}" Mount 

27 

28 

27 

82 

Franklin 

30 

29 

24 

83 

Jacksonville 

29 

28 

27 

84 

Total 

311 

295 

256 

862 


t Cities of over 50,000 population. 

(а) Would you conclude that there are any real income differences 
among the three groups of people? 

(б) What are the confidence limits for the difference between the aver¬ 
age income of the regular and potential patrons? 

(c) Is there any feature of this analysis which might be open to question ? 

18.10. An analysis was made of the fiber diameters in microns at 6 
different regions on the seed coat of the Mexican 128 variety of cotton.’^ 
The anal 3 ^sis was made on a sample of 10 seeds as shown in the accom¬ 
panying table. 

Data of Exercise 18.10 


Plant 

Region 

Total 

1 

2 

3 

4 

5 

6 

A 

10.49 

17.80 

17.54 

16.75 

17.54 

17.54 

103.66 

B 

15.45 

15.96 

15.71 

14.13 

14.40 

14.40 

90.05 

C 

16.23 

15.96 

16.49 

14.92 

14.06 

14.92 

93.18 

D 

18.33 

17.28 

16.49 

16.49 

17.28 

17.80 

103.67 

E 

16.49 

18,33 

17.54 

17.02 

17.28 

18.06 

104.72 

F 

16.49 

17.54 

17.05 

15.71 

15.45 

11.06 

96.90 

G 

15.96 

15.71 

16.23 

16.49 

15.18 

16.49 

96.06 

H 

10.75 

16.23 

14.60 

15.96 

13.35 

16.75 

93.70 

I 

14.40 

18.33 

17.02 

14.66 

15.71 

17.02 

97.14 

J 

16.49 

17.02 

16.75 

17.54 

15.71 

16.49 

100.00 

Total 

163.08 

170.16 

165.48 

159.67 

156.56 

164.13 

979.08 







(a) Show that the analysis of variance is correct, and fill in the degrees 
of freedom. 


Source of 

Degrees of 

Mean 

variation 

freedom 

square 

Plants 


4.15 

Regions 


2.22 

Error 


.692 


(6) What statement would you make regarding region differences? 

(c) What is the standard error of a region mean? 

(d) What are the confidence limits for the difference between regions 1 
and 6? 

18,11. Middleton and Chapman conducted an experiment to determine 
the best variety out of eight varieties of oats at Laurinburg, N.C., in 
1940. The yields of grain in grams for a 16-foot row were as shown in 
the accompanying table. 


Data of Exercise 18.11 


Variety 

Blocks 

Sum 

Mean 

I 

II 

III 

IV 

V 

1 

296 

357 

340 

331 

348 

1,672 

334.4 

2 

402 

390 

431 

340 

320 

1,883 

376.6 

3 

437 

334 

426 

320 

296 

1,813 

362.6 

4 

303 

319 

310 

260 

242 

1,434 

286.8 

5 

469 

405 

442 

487 

394 

2,197 

439.4 

6 

345 

342 

358 

300 

308 

1,653 

330.6 

7 

324 

339 

357 

352 

220 

1,592 

318.4 

8 

488 

374 

401 

338 

320 

1,921 

384.2 

Sum 

3,064 

2,800 

3,065 

2,728 

2,448 

14,165 

354.1 


(a) Determine whether or not there are significant differences among 
the varieties. Which variety would you recommend? 

(5) What is the efficiency of this design compared with a completely 
randomized design? 

(c) What is the standard error for the difference between two varietal 
means ? 

(d) Set up a field plan for conducting this experiment. 

18.12. Frequently one or more plots in a randomized blocks experiment 
will be missing because of some adverse condition such as a washout or 
insect infestation. As a result the orthogonal properties of the analysis 
are upset, and the statistician must either use an approximate analysis or 










run a complete least-squares analysis. Let us assume in the oats experi¬ 
ment of Exercise 18.11 that the plot for variety 1 in block I had been 
washed out. An approximate method of analysis is to set this yield 
equal to y, run up the analysis of variance in terms of y and the other 
yields, and estimate y by minimizing SSE. Then substitute y for the 
missing plot yield, and complete the analysis, decreasing the error degrees 
of freedom by one. 

(a) Show that y = {'pT rB — G)/{t — l)(p — 1), where T and B 
are the yields for the treatment and block with the missing plot. 

(6) Estimate y for the oats experiment, and compute the analysis of 
variance. 

(c) Make a complete least-squares analysis of the data, regarding the 
missing plot as nonexistent; in other words, we have only 4 plots for 
variety 1 and 7 plots in block 1. Show that you obtain the same value of 
SSE as in (5) but that SST in (6) is too large by (2,768 — 7y)^/5Q. 

18.13. Prove that aUTi ~ f )(Bj - B)] = 0 in Sec. 18.3. 

18.14. (a) The student should set up the least-squares equations and 
the analysis-of-variance table for an r X r Latin-square design and indi¬ 
cate the expected values of the mean squares in the analysis of variance. 

(b) What is the standard error of the difference between any tAvo treat¬ 
ment means? 

(c) Show that the efficiency (neglecting the loss of degrees of freedom) 
of the rows in reducing the error variance in an (r X r) Latin-square 
design as compared with a randomized complete-blocks design with the r 
columns as blocks is given by 

= SSR + (r - l)^sl 
r(r - l)sl 

where SSR is the sum of squares for rows and si is the error mean square 
for the Latin-square design. Similarly the efficiency of the columns is 
found by replacing SSR by SSC (sum of squares for columns). Hint: 
Use the same method as given in Sec. 18.3 (vi) for the efficiency of the 
randomized complete-blocks design as compared with a completely 
randomized design. 

(d) What is h in this case? What is h'? 

18.15. Given a Latin-square design with r rows, columns, and treat¬ 
ments. Assume that the data for one plot were lost. Show that the 
estimate of the missing value found by minimizing SSE is 

= r{R + C + T) - 2G 
^ (r - 1) ()• - 2) 





COMPLETE-BLOCKS DESIGNS 


247 


where R, C, and T are, respectively, the totals of the row, column, and 
treatment which contain the missing value. 

18.16. A Latin-square design was used at the University of Hawaii to 
compare 6 different legume intercycle crops for pineapples.® The yields 
in 10~gram units are given in the accompanying table. 


Data of Exercise 18.16 


Row totals 


B 

F 

D 

A 

E 

C 


220 

98 

149 

92 

282 

169 

1,010 

A 

E 

B 

C 

F 

D 


74 

238 

158 

228 

48 

188 

934 

D 

C 

F 

E 

B 

4 


118 

279 

118 

278 

176 

65 

1,034 

E 

B 

A 

D 

C 

F 


295 

--- - 

222 

54 

104 

213 

163 

1,051 


D 

E 

F 

A 

B 


187 

90 

242 

96 

66 

122 

803 

F 


C 

B 

D 

E 


90 

124 

195 

109 

79 

211 

808 

Col. totals 984 

~i,o’S 


907 

" 864’~ 


5,640 


{a) Make a complete analysis of this experiment, and state which 
legume you would recommend for planting. 

(6) Use the Tukey method^ to determine whether or not there are 
significant differences among the three top treatments. 

(c) Was the Latin-square design useful in this case (see Exercise 18.14) ? 

(d) Set up another field plan for this experiment. 

18.17. Professor John Wishart furnishes us this exercise:^ 

“An experiment on the use of nitrogenous fertilizers on wheat was 
arranged as a 5 X 5 Latin square, each plot being acre in size. The 
control plot (having no fertilizer) is denoted by 0; 5 marks the plots 
which received a single dressing of sulphate of ammonia in March, while 
SS marks the plots which received the same total dressing, but in 6 
monthly dressings from November to April. Plots which received 
cyanamide in October to an equivalent amount (in nitrogen) as the 
others are marked C, while D marks plots which received half their dress¬ 
ing as cyanamide and half as dicyanodiamide. The plan is given below, 
the numbers denoting the yields of the plots in pounds. Conduct the 
statistical analysis of the data, measuring the significance of the effect of 
applying a nitrogenous dressing, of whatever kind; also see if you can 
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determine, by a statistical test, whether some forms of dressing are more 
effective than others. Set out a full summary table with your conclusions. 


D 

SS 

0 

C 

S 

72.2 

55.4 

36.6 

67.9 

73.0 

0 

C 

SS 

S 

D 

36.4 

46.9 

46.8 

54.9 

68.5 

SS 

S 

D 

0 

C 

71.5 

55.6 

71.6 

67.5 

78.4 

S 

0 

C 

D 

SS 

68.9 

53.2 

69.8 

79.6 

77.2 

C 

D 

.8 

SS 

0 

82.0 

81.0 

76.0 

87.9 

70.9 


The sum of squares of the 25 numbers in the table is 113,574.73.’^ 
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CHAPTER 19 

THE ANALYSIS OF INCOMPLETE-BLOCKS DESIGNS 


19.1. Introduction. In this chapter, we shall discuss the use of designs 
in which the number of plots per block is less than the number of treat¬ 
ments. As before, we assume p treatments, but with only k plots per 
block (k < p). Assume further that each treatment is applied to r plots 
(each treatment is replicated r times) and that there are a total of q blocks 
lq> r). Hence there are a total of N = kq = rp plots in the experiment. 
The ^'th treatment appears in the jth block riij times (n,y = 0 or 1). Hence 

^ = r, and ^ riij = k. This design is called an incomplete-blocks 

0 

design. 

It might be noted that there are two important situations for which 
these incomplete-blocks designs are used: 

(a) The number of possible plots per block is less than p. Examples of 

this are: 

(i) Nutrition studies on animals where the block is a litter, the 
individual animal is the plot, and the smallest litter size is 
smaller than the number of rations to be studied. 

(ii) Chemical experiments where the block is a day, an individual 
analysis is the plot, and the number of possible analyses in a 
day is less than the number of treatments. 

(hi) Tasting experiments where the block is a single trial by an 
individual taster, the score on a given product is the plot yield, 
and the number of different products which a taster can 
differentiate on a single trial is less than the number of prod¬ 
ucts being tasted. 

(iv) Education or psychology tests in which the block is the trials 
by a single child on a given day, the plot is a single test on this 
day, and the number of tests being considered is greater than 
the number the child can take on a given day without tiring. 

(v) An engineering experiment in which the block is an oven and 
the number of treatments is greater than the number of items 
which can be heated at a time. 
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(b) The ntimher of treatments is so large that enough homogeneous plots 
cannot he found for a complete block. 

(i) If we have 81 varieties of corn to test, it is often impossible to 
find 81 more or less homogeneous plots to form a complete 
block. 

(ii) For greenhouse experiments, temperature and light conditions 
vary so much that it is desirable to form blocks of rather small 
size, the size often being smaller than needed to include all of 
the pots for a complete block. 

We shall not include many examples of these designs in this book, because 
of a limitation of space. Students interested in more examples should 
consult references 1 to 3. 

19.2. General Least-squares Equations. The experimental model 
for this design is 

Yij = n - ij{u Ti -j- ( 3 j -f- € ij ), i =■ 1, 2, . . . y p ) j — Ij 2, . . . ,5, 

where = 0 or 1 and ^ ^ /3y = 0. 

i j 

The least-squares equations for this model are 


m: Nm = (?, 

V 

hji km + ^ nijti + khj = Bj, 

i = i 

til rm -h rti + ^ riijhj = Ti. 
y = 1 

The reader will note that the block and treatment effects are mixed up in 
these equations; this is called confounding of the treatment and block 
effects. 

The first step in solving for the treatment effects is to adjust the treat¬ 
ment effects for the block effects by removing the block constants from 
the treatment equations. This is accomplished by multiplying the bj 
equation by nij/k and subtracting the sum of all of these altered hj equa¬ 
tions from the U equation. The resultant equation is 
^ <2 p ^ 

rti ^ ^ nijUifi = Ti ^ HijBj = 

where Ai is called an adjusted treatment total (adjusted for block effects) 

and ^ = 0. The adjusting factor ^ riijBj is often written as Tu, the 

i 3 

total yield of all blocks containing treatment i . Suppose the iih . treat- 








XiV J^UJ JL XJ lit \o I.\J ly o 


jLiO Jl 


ment appears Xu times in the same block with the ^!th treatment. Then 

y n,n, = j ^ !• 

4^ If I ^ t, 

J 

Hence we obtain 

[r{k ~ l)ti - ^ Xati]/k = Ai, i, I = 1, 2, , . . , p. 




The values of the U can be solved only by the simultaneous solution of 
the p equations in Ai, with the linear restraint, ^ U = 0. One method of 


solving these p equations is to set U — t'/ + tp, so that = 0 and 
tp = —X^l'i/p. Hence 

Ai = lr(k - 1)4" - 2 V4'l/fc, i = 1, 2, . . . , p - 1, 


since ^ 'Xu = r{k — 1). 




We see that 


p -1 



z = i 


where the c'/i are the elements of the inverse matrix of the coefficients of 
t'i in the {p — 1) equations in Ai. 

In this case 


p-\ 


SST(adi) = 2 = X X 

i=\ 


since = 0. It can be shown that 

J5;[MST(adj)] = + B{ri}/{p - 1), 

where 6 is 0 when {n = 0). Also, 

=: and - i'/) = (c'' - 2c'; -f 

where cr‘^ = = SSE/(n ~ p — q 1). 

If only a test of the null hypothesis {r^ = Oj is desired, the forward 
solution of the abbreviated Doolittle method is sufficient, where the A 
vahies replace the G values in the computations of Sec. 15.3 and the coeffi¬ 
cients of t'/ and t'/ are the elements of the original matrix. An example 
of this computing procedure will be given in Chap. 20 (Example 20.4). 
No detailed computing techniques are given in the present chapter, 
because almost all incomplete-blocks designs which are used have certain 
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restrictions to make the analysis simpler. Some of these designs will be 
discussed in succeeding sections of this chapter. 

19.3. Balanced Incomplete-blocks Designs. If every treatment 
appears with every other treatment in the same block an ecpial number of 
times, say X, the incomplete-blocks design is said to be bala7iced (all 
Xi 7 ^ X). In this case the adjusted treatment equation becomes 


r(/c — 1) -f X 


ti A-j 


because ^ = 0, so that h — ^ti¬ 


ll we set 


1) -f X 


ti = AilrEf, 


Ef is called the efficiency factor for the incomplete-blocks design, since 
the number of effective replications is rEf, instead of r, where E/ < 1. 
We see that the number of blocks in this case is 

q = XC(p,2)/C(k,2) = \p(p - l)/k(k - 1), 

where C{a,h) is the number of combinations of a things taken 6 at a time. 
Hence Xp(p — 1) = qk(k — 1) = rp(k — 1), and 


' 1) ~h X 
rk 


E = - 1) + X ^ ^ ^ p(k - 1) 

^ rk rk k(p — 1) 

We might say that a necessary condition for balance'^ is that 

Xp(p — 1) = qk{k — 1) = rp{k — 1). 


However, it may not be possible to construct a balanced design even 
though this condition is met. 

The adjusted mean yield for the fth treatment is 

t'i = ti + m = Ai/rEf + G/rp = [k(p — l)^i + (fc — l)G]/rp(k — 1). 

In Sec. 15.3 (page 199), it was indicated that the reduction due to any 
fixed variate, Xi, adjusted for all previous fixed variates is given by AiyBiy, 
where Aiy = Sy{xi adjusted for all previous fixed variates in the matrix) 
and Biy was the regression coefficient for Xi. In the balanced incomplete- 
blocks design, the regression coefficient U = AifrE/, where Ai is the sum 
of cross products, adjusted for block effects. Hence the treatment sum 
of squares adjusted for block effects is 

SST(adj) = y AI/rE,. 

i 
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A more formal proof would be the following: 

The total reduction in sum of squares due to blocks and treatments is 


SSR 


= lh,Bj + l 


UTi 


If the treatment effects are omitted from the model, 

SSR' (omitting treatments) = ^ (R; — BY/k = ^ B^Bj — B)/k, 

J ^ 3 

Hence the sum of squares due to treatments is 
SST(adj) = SSR - SSR' 

= f) + i ? ntsB^jjrE, 

= 2 Af/rE, + I Bj 1^6,- + i I 

where B = G/q., From the G and Bj least-squares equations (Sec. 19.2), 
weseethatbj + Y /k) ^ = (Bj — B)/k. Hence SST(adj) = ^Af/rE/. 

i ^ 

Also, SSE = SiY - SSR with {N - p - q A-1) degrees of freedom. 

It might be noted that 


Bs- B 


A, 


— k(3j ^ TlijTi 4“ ^ XX I’ 

i Li ^ i 3 -I 

= rEfti + I^X XX 


G = rpfjL + 


XX 


t 3 


It can be shown thatf 

E[{Bj - BY] = {k&i + 'X niiT,Y + k{q - l^^q, 

E{Af) = {rEfY'Y + r{k - l)<rVfc, 

E(G-) = (rpM)= + rvr^ 
ai^AiAi) = —Xcr^//c, i 7^ I, 
cr[(Bj - R)A,1 = 0 = aliBj - B)G] = a{A,G). 


t These expectations are based on the assumption that the blocks are fixed. If 
the block effects are assumed to be random, more efficient estimates of the treatment 
means can be obtained by utilizing the so-called recovery of interblock information. 
This estimation procedure was first introduced by Yates^ and will be discussed in 
Sec. 24.1. 




The analysis of variance is: 


Source of variation 

Degrees of 
freedom 

: 

j 

Mean square 

E(MS) 

Blocks 

g - 1 

^ (Bi - B)yk(q - 1) 

1 


Treatments (adj) 

p - 1 

' 

J 

Ai/rEfip - 1) 
i 

0-2 + rEfOir)^ 

Error 

A — p — g + 1 1 


0-2 


t e{r) = 2^ rUip - 1). 
i 

It is easy to see that the one-tailed F test is appropriate to test 

Ho: |r, = 0} 

with estimated by the error sum of squares divided by (iV — p — g + 1). 

= {k - l)a^^/rkEj. 

0-2(0 = 0-2(0 + cr2/rp. 

(tKU - ti) = - t[) = [2r{k - 1) + 2\]ayk{rEfY = 2<jyrEf. 

The problem of estimating the efficiency of a balanced incomplete- 
blocks design as compared with other designs will be discussed in Chap. 
24, except for a special type of balanced design which is discussed below. 

Example 19.1. A special type of a balanced incomplete-blocks 
design is one with p = /c2 treatments, called a balanced lattice design.X 
This design can be set up in r complete replications of k blocks each (with 
/c treatments per block); hence, g = rk,\ ~ r{k — l)/(p — 1) = r/(/c -f 1), 
and Ef = k/{k + 1). A balanced lattice design Avith p = 9(/c = 3) and 
r == 4 was used to test 9 rations fed to rats. Hence X = 1, and Ef = f. 
The gains for this experiment were (the ration numbers are in parentheses) : 


Replication I Replication II 


i 







Bi 

3 







Bi 

1 

(1) 

20 

(4) 

15 

(7) 

11 

46 

4 

(7) 

8 

(8) 

12 

(9) 

16 

36 

2 

(3) 

8 

(6) 

18 

(9) 

26 

52 

5 

(1) 

20 

(2) 

2 

(3) 

2 

24 

3 

(2) 

18 

(5) 

16 

(8) 

2 

36 

6 

(4) 

20 

(5) 

6 

(6) 

2 

28 

Total 







134 

Total 










Replication III 





Replication IV 



i 







Bi 

j 







Bi 

7 

(1) 

13 

(9) 

19 

(5) 

14 

46 

10 

(5) 

19 

(7) 

23 

(3) 

6 

48 

8 

(8) 

14 

(4) 

34 

(3) 

2 

50 

11 

(1) 

22 

(6) 

12 

(8) 

2 

36 

9 

(6) 

14 

(2) 

20 

(7) 

14 

48 

12 

(9) 

27 

(2) 

7 

(4) 

20 

54 

Total 







144 

Total 







138 


t A more complicated lattice with p — treatments is called a cubic lattice. 









incomplete-blocks designs 
The computations are as shown in Table 19.1. 

Table 19.1 


Total gain - Ti 

12 

V nijBj — Tbi 

A 

3A( = Sr.- - Th I 

75 

152 

73 

47 

162 

— 21 

18 

174 

-120 

89 ' 

178 

89 

55 

158 

7 

46 

164 

-26 

56 

178 

-10 

30 

158 

-68 

88 

188 

76 

504 - G 

1,512 = 3(? 

0 


22.11 

11.67 

.67 

23.89 
14.78 
11.11 

12.89 
6.44 

22.44 

126.00 


In this experiment, a constant for replications can be inserted in tlm 
model and a sum of squares for replications removed from the block ^u 
of squares. The total sum of squares for blocks is 

C4(5'|2 4- (52)2 4. ... 4- (36)2 4 (54)» _ ( 504)2 
SSB = - gg 

= 7,402.67 - 7,056 = 346.67. 

The sum of squares for replications is 

(134)2 4 • • • + (13 8)_2 _ y Qgg ^ 219.56. 

“ 9 

Hence the sum of squares for blocks in replications is 
346.67 - 219.56 = 127.11. 

The sum of squares for treatments (adjusted for blocks) is 

aamr (73)2 4 (_21)2 - f—^ 4 456.15. 

SST(adj) = ^)(3) 

The error sum of squares is 

Sy^ - SSB - SST(adi) = 9,316 - 7,056 - 346.67 - 1,466.15 = 457.18. 




Hence the analysis-of-variance table is: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

E(MS) 

Replications 

3 

219.56 

73.19 


Blocks (in replications) 

8 

127.11 

15.89 


Treatments (adj) 

8 

1,456.15 

182.02 

cr2 + 30 (t) 

Error 

16 

457.18 

28.57 

,t2 

Total 

35 

2,260 




To test for treatment differences, F = 182.02/28.52 = 6.37 with 8 and 
16 degrees of freedom, for which a < .001. The standard error of the 
difference between any two adjusted treatment means {t\ — t[) is 


V 


2(28.57) 

3 


4.36. 


In this case we can compute the efficiency of this design relative to a 
randomized complete-blocks design with 4 replications (complete blocks). 
First Ave compute SST (unadjusted) and then the estimated randomized- 
blocks error as the difference — SST — SS (replications). The 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Replications 

3 

219.56 


Treatments 

8 

1,194.00 


Randomized blocks error 

24 

846.44 

35.27 

Total 

35 

2,260.00 



effective incomplete-blocks error is 28.57/£^/ = 38.09. Hence the effi¬ 
ciency of the incomplete-blocks design is only 35.27/38.09 = .93. In 
addition there is another loss due to fewer error degrees of freedom (see 
reference 1, page 28). 

19.4. The Simple Lattice Designs. One special type of a nonbalanced 
incomplete-blocks design is a lattice design with fewer replications than 
are needed for balance. A lattice design with two replications is called a 
simple lattice^ one with three replications a triple lattice, etc. It was 
shown in Example 19.1 that a lattice is balanced if the number of replica¬ 
tions r = /o + 1. Hence, if r < /c -f 1, the lattice is not balanced. The 
analysis of a nonbalanced lattice is much simpler than the general analysis 
presented in Sec. 19.2, because of a special method of allocating the treat- 
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merits to the blocks. The treatments are assigned at random one of 
the following treatment numbers: 


11 12 — • lA; 
21 22 • • • 2k 


kl k2 • • • kk 

If a simple lattice is used (r — 2), the row combinations (11,12, . . . ,1k; 
21,22, . . . ,2k; etc.) are assigned to separate blocks in one replication, 
X, and the column combinations (11,21, . . . ,/cl;12,22, . . . ,/c2;, etc.) 
assigned to separate blocks in the second replication, Y. If a triple 
lattice is used, an additional replication, Z, is taken from the diagonals 
(11,22, . . . ,kk; 21,32, . . . ,1k; etc.). For a complete discussion of 
this design, see references 2 and 7. 

Let us consider the analysis of a simple lattice’’experiment (r = 2). We 
shall designate the treatment effects as j = 1,2, . . . ,k) and simi¬ 
larly for treatment totals (7\y) and adjusted treatment totals (.4^). Let 



J i 


The yield of the (f,j) treatment in the X replication is designated as 
and in the Y replication as Yij. Therefore Tij — Xij + F^. And finally 
we shall use the notation 



i j i j 


and similarly for F. Hence 

A^j = - I (Xi, + K.y). 


Also we note that X will be 1 for those treatments with the same row or 
column subscript and 0 elsewhere. Hence 


Aij = ^2(k - ])tij - ^ tif - ^ k 

~ 2tij — (a,' + dj)/k. 

2 = Y,. - Y. ./k = a,, 2 •/* = 


since ^ Of = '^di = Ui = 0. 
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We have shown that the sum of squares for treatments (adjusted for 
blocks) is 

AiiUj = i + (Qi + di)/k] 

i j i j 

= *21 + 

• [T,i - (Xi. + F., - Yi. - X.j)/k]]. 

By expanding these products term by term, we have 

SST(adj) = i 122 n - 

- [2 +1 + dl + Z 

i j i J 

where Ti. — Xi. + F^. and T.j = X.j + Y.j. 

The error sum of squares is obtained by computing the unadjusted 
block sum of squares, SSB, and then subtracting to obtain 

SSE = Sy^ - SSB - SST(adj). 

The analysis of variance is: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Blocks 

2k - 1 

SSB 1 


Treatments (adj) 

- 1 

SST(adj) 

MST(adj) 

Error 

(k - 1)^ 

SSF. ' 



The blocks sum of squares can be divided into two parts, for replications 
(1 degree of freedom) and blocks in replications [2{k — 1) degrees of 
freedom]. The replication part is independent of treatment effects and 
equals (A".. — Y. .)^/2k'K The blocks-in-replication part is a mixture of 
block and treatment effects as well as error and is simply SSB — 

(A.. - F..)V2/clt 

In this case it is not possible to give a single figure for the variance of 
the difference between two adjusted means, since some treatments will 
appear together in the same block and others never appear together in 
the same block. It seems reasonable that the variance of the difference 
will be lower for those treatments appearing together in the same block. 
Let us first consider two adjusted treatment means and which 
appear in the same block. 

t Most analyses of lattice designs now make use of the recovery of interblock infor¬ 
mation, mentioned in the footnote on page 253. See references 1 and 8 for some exam¬ 
ples of the theory and computational procedures. 


SST(adj) = yy 


t 
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4 = - (^>- 


fii 


4i = + Fy - Xy 


Yi. 

Yv 


X.f + Y.,)/k], 

(Xy - X^. - Fi. + Y,.)/k] 




x^,-) 


+ 




(Fi, - Fy) + 


y 2 




3 7^3 


F2/) 


-Kt'u - t[j) = (2/4fc=)[(/o - 1)= + (fc - 1) + (fc + 1)' + (fc - l)k= 
= (/c 1)(F^/ k. 


Now consider two treatments which do not appear together in the same 
block, such as and ^ 22 . In this case the X.j and F.y do not cancel out, 
as above, and we have 

- t',,) -- {k + 2)(TVfc. 

We note that there are k‘^{k‘^ - l)/2 possible treatment comparisons, 
of which k‘^{k — 1) are between treatments in the same block and 
__ 1)2 between treatments not in the same block. Hence we might 
use as an average variance of the difference between adjusted treatment 
effects, 

\k\k - l)ik + 1)A + - lyik + 2_)/2^ _ (k + 

k^k^ - l)/2 k + l 

The factor {k -f l)/{k + 3) is the efficiency factor, Ef, for this design. 

Example 19.2. To illustrate the computing techniques for the simple 
lattice, we shall use the first two replications of the (3 X 3) experiment 
considered in Example 19.1. We shall designate the treatments as 11, 
12, 13, . . . , 33 instead of 1, 2, , . . , 9, with X for Replication II and 
Y for Replication I. The treatment and X and Y totals are (treatment 
numbers in parentheses): 








Ti. 

hj 

X., 

Yi. 

Xi. 

F/ 

(11) 

40 

(12) 

20 

(13) 

10 

70 

. r 

'”48"' 

46 

24 


(21) 

35 

(22) 

22 

(23) 

20 

77 

2 

20 

49 

28 

36 

(31) 

19 

(32) 

14 

(33) 

42 

75 

3 

20 

39 

36 

52 


94 


56 


72' 

222 

Total 

88 

134 

88 

"134: 


Sy^ = 3,706 - = 968, 


SST(adj) = 2 1(40)' + 


+ (42)^ - 


(70)2 + (77)2 + . . . + (72)' , 2[(48)' + (20)' + • • • + (39)'] \ 
-S 3 ■ ) 


= i[6,530 - 235.U - 11,203.33 + 6,094.67] = 593.11, 


SSB = 


( 24 )' + ( 28 )' 4 - 


+ (52)' (222)' 


18 


= 186. 




The analysis of variance is: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

scpiare 

Replications 

1 

118 


Blocks (in replications) 

4 

68 


Treatments (ad j) 

8 

593 

74 

Error 

4 

1 

189 

47.2 

Total 

17 

968 


Treatm ents (uiiadj usted) 

8 

527 


Randomized blocks error 

8 

323 

40.4 


The estimated average standard error of the difference between two 
adjusted treatment effects is -v/GsVd = 8.4. tf-j = [kT^j — 5ij]/2k, where 
8ij = {Xi. — Yi. — X.j + Y.j). The 8ij and are presented below. 


8i j 

1 2 

3 



1 

2 

3 


1 

-24 -6 

10 

-20 

1 

16.0 

9.0 

6.7 

31.7 

2 

-23 -5 

11 

-17 

2 

13.7 

10.2 

11.8 

35.7 

3 

-5 13 

29 

37 

3 

8.7 

9.2 

25.8 

43.7 


-52 2 

50 

0 


38.4 

28.4 

44.3 

111.1 


The efficiency of this design as compared with randomized complete 
blocks is only 40.4 3(47.2)/2 = 0.57, plus the fact that we have only 4 

instead of 8 error degrees of freedom. This result has no practical use 
because no experiment should be planned if it has only 4 degrees of free¬ 
dom for the estimate of the error variance. 

If the simple lattice is duplicated several times so that 2d replications 
{2kd blocks) are used, the analysis is changed in the following respects: 

(i) There are now {2d — 1) degrees of freedom for replications and 
2d{k — 1) degrees of freedom for blocks in replications. The differences 
between block totals with the same treatments in the blocks are free of 
treatment effects and hence indicate real block differences. The block 
totals can be analyzed as follows for the X group (and similarly for the 
Y group): 

Degrees of freedom 
Replications d — 1 

X groups ^ — i 

Block effect (d — 1)(A; — 1) 

Hence there are 2{d — l){k — 1) degrees of freedom for real block differ¬ 
ences and 2(k — 1) degrees of freedom with block and treatment effects 
mixed up. 

(ii) Each Xij and Yij is now a total of d yields and Tij a total of 2d 
yields. Hence all sums of squares must be divided by 2d instead of 2 
(similarly for means). Also, X will now be d instead of 1. 
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(iii) The above variances of the differences between two adjusted 
treatment effects are divided by d. 

(iv) There are now {k — l){2dk — k — 1) degrees of freedom for 
error, 

19.6. Other Lattice Designs. Yates^ presents the theory for triple 
lattices. We shall not include the details in this book. The essential 
change is that each treatment is given a third subscript as was done for 
the Latin-square design, and a Z replication is introduced. This design 
is also presented in references 1 and 8. If more than three replications 
are used, in almost all cases either the simple or triple lattice is duplicated 
or a balanced design is used. 

It should be indicated that it is also possible to set up a lattice experi¬ 
ment in a Latin-square design, called a lattice-square design. The com¬ 
puting details are much more complicated than for the randomized blocks 
designs, but the reduction in error variance is often quite large especially 
for very heterogeneous experimental material. Details of these designs 
are also presented in reference 1. 

Ilarshbarger^’i'^ introduced a new type of lattice design, called the 
rectangular lattice, in which the number of treatments is k{k + 1). This 
design enables the experimenter to utilize the simplicity of the lattice 
without restricting his experiments to exactly k- treatments. Cochran 
and Cox^ present simplified computing techniques for this type of design 
also. A computing manual has been prepared hj Robinson and Watson. “ 

Kempthorne and Federer^“ present the general theory of prime-power 
lattices. 

19.6. Methods of Constructing Incomplete-blocks Designs. Refer¬ 
ences 3 and 13 present methods of constructing balanced incomplete- 
blocks designs. The construction of the lattice designs is quite simple, 
since the experimenter needs only to randomize the treatment numbers, 
then the blocks within a replication and the treatments in each block. 
We shall present one example of a balanced design which is not a lattice. 

Example 19.3. Consider the design for the 7 treatments in blocks of 
3 in Exercise 19.3. There are C(7,3) = 35 combinations of these 7 treat¬ 
ments in groups of 3, but if we restrict ourselves to enough combinations 
so that each treatment appears once and only once with every other 
treatment (X = 1), we can manage with 7 blocks. The method of con¬ 
struction is one of gradual elimination. Obviously there are many ways 
of selecting the set of 7 from the set of 35 combinations. One such set of 
7 is 

ABC, ADE, AEG, BDF, BEG, CDG, CEF. 

Another was used in Exercise 19.3, Probably the best procedure to use 
is to select some basic rule, such as the one given in reference 1. The 
following set 




^\JU 




ABD, BCE, CDF, DEG, EFA, FGB, GAC 

was obtained by cyclic substitution (the first letter is one removed from 
the second and the third two removed from the second, whereF + 2 = A, 
for example). Then randomly assign the 7 treatments to the 7 letters. 

19.7. Summary. In Sec. 19.1, we indicated certain experimental 
conditions which might make the use of an incomplete-blocks design 
desirable. For illustrative purposes, some data from an experiment with 
nine treatments have been analyzed. Unless the blocks are so small that 
nine experimental units cannot be placed in each block, it is seldom advis¬ 
able to use the iiicomplete-blo(;ks design for so few treatments, unless 
enough replications can be used to give at least 20 degrees of freedom for 
the error mean square. Even then there is seldom enough gain in effi¬ 
ciency to warrant the extra computing time. The experimenter needs to 
know something about the block-to-block variability of his experimental 
material before he can decide whether or not he should use an incomplete 
blocks design. We believe that it is advisable to build up a body of 
evidence from continuing experiments in order to help with this decision. 
Many engineering experiments and experiments in the physical and social 
sciences may make use of the incomplete-blocks designs, because the size 
of the blocks is often limited. However, little information is available 
on this point to date (see reference 14 for one example in the field of 
chemistry). 

Since the analyses of most incomplete-blocks data use the recovery of 
interblock information, more will be said on these points in Chap. 24. 
However, it should be warned here that it seems inadvisable to use this 
method of analysis unless there are at least 15 degrees of freedom for the 
block mean square (and preferably at least 25 degrees of freedom). 

EXERCISES 


19.1. A simple example of a balanced incomplete-blocks experiment is 
the following balanced lattice with four treatments, six blocks, and two 
treatments per block: 


Treatments 

Blocks 

Total 

la 


16 

Ila 


II6 

Ilia 

1116 

1 

30 



30 



25 


85 

2 

15 





10 


20 

45 

3 



30 

25 




40 

95 

4 



10 



10 

15 


35 

Total 

45 


40 

55 


20 

40 

60*" 

260 



85 



75 


100 
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(a) Set up the analysis of variance and adjusted treatment means .for 
this experiment, and Inst for treatment differences. 

{h) Estimate the standard error of the difference between any two 
adjusted treatment means. 

(c) Show how the replication constants fit into the model, and deter¬ 
mine the expected value of the replication mean square in the analysis of 
variance. 

(d) Compute an estimate of the efficiency of this design relative to a 
randomized complete-blocks design. 

19.2. Dr. Pauline Paul'^ conducted an experiment to compare the effects 
of cold storage on the tenderness and flavor of beef roasts. Six periods of 
storage (0, 1, 2, 4, 9, and 18 days) were tested (p = 6). Since the same 
cut of meat on each side of an animal was expected to be similar but differ¬ 
ent cuts on the same side dissimilar, it was decided to use a balanced 
incomplete-blocks design with /c = 2, X = 1, (2 = ^ "k 

case it was also possible to arrange the cuts in complete replications of 3 
cuts from each side. The scores for tenderness of beef were (periods of 


storage in parentheses): 











Replication T 



Replication IT 




Replication III 



Bi 




Bi 






Bi 

(0) 7 (1) 17 

24 

(0) 

17 (2) 

27 

44 


(0) 

10 

(4) 

25 

35 

(2) 26 (4) 25 

51 

(1) 

23 (9) 

27 

50 


(1) 

26 

(18) 

37 

63 

(9) 33 (18) 29 

62 

(4) 

29 (18) 

30 

59 


(2) 

24 

(9) 

26 

50 


VM 




153’ 






US 


Replication IV 


Rej)lication V 








Bi 





Bi 




(0) 

25 (9) 

40 

65 

(0) 

11 

(18) 

27 

38 




(1) 

25 (4) 

34 

59 

(1) 

24 

(2) 

21 

45 




(2) 

34 (18) 

32 

66 

(4) 

26 

(9) 

32 

58 







190 





141 





The blocks run across the page. 

(a) Make a complete analysis of these data, showing the adjusted 
treatment means, the analysis of variance, the standard error of treatment 
differences, and general conclusions. 

(b) Show how to determine the number of paired cuts of beef (blocks) 
needed for a balanced experiment in this case. 

(c) Suppose it had been possible to pair the cuts into groups of four 
like ones, instead of two. What design would you then set up? 

(d) Is there any method of determining the trend of tenderness in terms 

of storage time? 

(c) Was there any appreciable gain in using the incomplete-blocks 
design for this problem? 
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19.3. Moore and Bliss® set up a balanced incomplete-blocks design to 
compare the toxicity of each of 7 chemicals on Aphis rumicis. The meas¬ 
ure of the toxicity was the logarithm of the dose (+3.806) required for a 
95 per cent kill. Since only 3 chemicals could be tested on a given day, 
k — 3. Seven days were required to make a balanced design. The 
toxicities were as follows: 



Day 

T 

V^Xi.C/lXXJL^CtX 

1 

2 

3 

4 

5 

6 

7 


A 

.465 

.602 


.443 




1.510 

B 

.343 




.652 

.536 


1.531 

C 


.873 

.875 


1.142 



2.890 

D 

.396 


.325 




.609 i 

1.330 

E 


.634 




.409 

.417 1 

1.460 

F 




.987 

.989 


.931 

2.907 

G 



.330 

.426 


.309 


1.065 

Bi 

1.204 

2.109 

1.530 

1.856 

2.783 

1.254 

1.957 

12.693 


(a) Can you separate out replications in the analysis of these 
data? 

(b) Show how to determine the number of da 3 ^s required for a balanced 
experiment. 

(c) Make a complete analysis of these data. 

(d) C and F were basically the same chemical compound. Were they 
different from one another? How did they compare with the standard 
treatment, A ? 

(e) How did A compare with all the others, excepting C and F? 

19.4. The data in the accompanying table represent the yields (in 

bushels per acre, minus 30 bushels) of a 5 X 5 simple lattice experiment 
on soybean varieties with 4 replications. ^ The variety numbers are given 
in parentheses. 

(a) First analyze only the first two replications of this experiment, 
indicating whether or not there are significant varietal effects, the average 
standard error of adjusted varietal differences, and the efficiency of using 
the lattice design. 

(b) Second analyze the entire experiment, making the necessary adjust¬ 
ments with d = 2. 

(c) Show how this experiment might have been set out in the field. 
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Data of Exercise 19.4 


Replication I 

Block Totals 


(1) 

6 

(2) 

7 

(3) 

5 

(4) 

8 

(5) 

6 

32 

(6) 

16 

(7) 

12 

(8) 

12 

(9) 

13 

(10) 

8 

61 

(11) 

17 

(12) 

7 

(13) 

7 

(14) 

9 

(15) 

14 

54 

(16) 

18 

(17) 

16 

(18) 

13 

(19) 

13 

(20) 

14 

74 

(21) 

14 

(22) 

15 

(23) 

11 

(24) 

14 

(25) 

14 

68 











289 





Replication II 




(1) 

24 

(6) 

13 

(11) 

24 

(16) 

11 

(21) 

8 

80 

(2) 

21 

(7) 

11 

(12) 

14 

(17) 

11 

(22) 

23 

80 

(3) 

16 

(8) 

4 

(13) 

12 

(18) 

12 

(23) 

12 

56 

(4) 

17 

(9) 

10 

(14) 

30 

(19) 

9 

(24) 

23 

89 

(5) 

15 

(10) 

15 

(15) 

22 

(20) 

16 

(25) 

19 

87 











392 





Replication III 




(1) 

13 

(2) 

26 

(3) 

9 

(4) 

13 

(5) 

11 

72 

(6) 

15 

(7) 

18 

(8) 

22 

(9) 

11 

(10) 

15 

81 

(11) 

19 

(12) 

10 

(13) 

10 

(14) 

10 

(15) 

16 

65 

(16) 

21 

(17) 

16 

(18) 

17 

(19) 

4 

(20) 

17 

75 

(21) 

15 

(22) 

12 

(23) 

13 

(24) 

20 

(25) 

8 

68 











361 





Replication IV 




(1) 

16 

(6) 

7 

(11) 

20 

(16) 

13 

(21) 

21 

77 

(2) 

15 

(7) 

10 

(12) 

11 

(17) 

7 

(22) 

14 

57 

(3) 

7 

(8) 

11 

(13) 

15 

(18) 

15 

(23) 

16 

64 

(4) 

19 

(D) 

14 

(14) 

20 

(19) 

6 

(24) 

16 

75 

(5) 

17 

(10) 

18 

(15) 

20 

(20) 

15 

(25) 

14 

84 


357 
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CHAPTER 20 

FACTORIAL EXPERIMENTS 

20.1. Introduction. The experimenter often wishes to test various 
types of treatments, each with several different representatives. For 
example, he might wish to compare two varieties of corn (z^i and V 2 ) and 
two different fertilizers (/i and/ 2 ) in a single experiment, giving a total of 
four treatment combinations (vifi,v-if 2 ,V 2 fi,V 2 f 2 )this is called a 2 X 2, 01 
2“, factorial experiment. These treatments could be tested in any of the 
field designs presented in Chaps. 18 and 19. The methods of analysis 
presented in those chapters would apply to this problem if all four treat¬ 
ment combinations were considered to be four unrelated treatments; 
however, if there are treatment differences, the factorial design can be 
used to explain these differences in a more definite manner. Lsing the 
factorial model, the effect of any treatment combination is considered to 
be the sum of three effects, varietal effect, fertilizer effect, and interaction 
of the variety and fertilizer. The interaction measures the failure of the 
various fertilizer effects to be the same for each variety or, conversely, the 
failure of the various varietal effects to be the same with each fertilizei. 

The interaction is the important effect about which the factorial design 
can give information. Many experimenters still examine the peiformance 
of one set of treatments, such as different fertilizers, for one standard 
variety and then different varieties for a standard fertilizer. Such an 
experiment tells little about the optimum fertilizer-variety combination 
which should be used, if the fertilizers do not respond in a similar manner 
for all varieties. Or if an engineer wants to know something about the 
relationship between the temperature of a process and the length of time 
the process is carried on, he needs to try out various combinations of the 
tAvo variables—temperature and time. Similarly an animal feedei may 
want to know the optimum level of supplemental feeding and type of 
pasture or the optimum combination of concentrates and roughage in the 
ration. And the human nutritionist needs to know the best combination 
of various parts of the diet for healthy living. All of Hiese experiments 
require some knowledge of how different amounts or kinds of one treat¬ 
ment interact with different amounts or kinds of another treatment. If 
the results are purely additive, that is, one treatment acts independently 
of the other treatment, the experiment can be divided into two simple 
experiments on the two treatments. However, the experimenter seldom is 
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sure that there is no interaction and often is afraid that there will be some 
interaction, especially if the individual representatives of each treatment 
are widely different. 

It should be pointed out that the factorial design can never lose any 
information even when there is no interaction (and if there is interaction, 
this design is indispensable). To illustrate this, suppose we consider the 
two varieties and two fertilizers, each of the four treatment combinations 
being used r times in an experiment. However, if there is no interaction, 
the average difference between the t-wo varieties is obtained from 2r 
individual differences, and similarly for the difference between the two 
fertilizers. Hence in the absence of interaction, we have obtained 2r 
replications for each variety and for each fertilizer. This feature of the 
factorial design, called hidden replication, should not be overemphasized, 
because it is built upon the thesis of no interaction. But it can be used 
as an argument against those who refuse to use this design, because they 
think they lose information when they have no interaction. The authors 
are firm believers in a wider use of the factorial type of experiment^ 
because one cannot lose any information on the main effects and one can 
obtain information on something which may be equally important, the 
interactions. The reader is encouraged to read references 1 to 3 for more 
extensive discussions of the analysis and usefulness of various types of 
factorial designs. 

20.2. The Analysis of a. p X q Factorial Experiment. Let us assume 
we have two sets of treatments (A and C), with p A treatments and q C 
treatments, with a randomized complete-blocks experimental plan. The 
experimental model can be written as follows (assuming r blocks of pq 
treatments per block): 

Yijk — p A- oLi A yj "b ((xy)ij -T 

— m A cii A Cj -b {ac)ij -b -b Cijk, 


where ai is the added effect of the fth A treatment {i =1,2, . . . ,p); 
7 y is the added effect of the yth C treatment {j = 1,2, . . . ,q); (ay),/is 
the added effect of the interaction of the fth A treatment and^th C treat¬ 
ment; and 0k is the added effect of the kill block {k = 1,2, . . . ,r).t 
The estimates are a,-, cj, (ac),y, and hk, respectively, a,- and yj arc called 


the main effects. The following restrictions are imposed: ^ = 0; 




= 0 , 


t If a completely randomized design is used, is omitted from the model; and for 
a Latin-square design, row and column constants replace jSk. The use of the incom¬ 
plete-blocks design will be discussed later. 
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It should be emphasized that the effect of a given A treatment is 
assumed to be the same for all C treatments; this assumption is justified 
only if there is no interaction. If there is an interaction, one should 
logically be studying different A effects for each C treatment and, con¬ 
versely, different C effects for each A treatment. Hence, if there is a real 
interaction, the experimenter will want to compare the pq treatment 
combinations. The analysis of most factorial experiments is usually a 
two-stage process: 

(i) Test the null hypotheses that all main effects and interactions 
vanish. 

(ii) If there is a significant interaction, then determine whether there 
are real A effects for each C treatment and real C effects for each A 
treatment. 

The least-squares equations for this model are 


Ai — qrm + qrai — qrii + qrat + 

Cj = prm + prcj = pr^x + pry^ + ei^k, 

i k 

(AC)ij -= r[m + ai + Cj + {ac)ijl =■■ r[fi -h + 7j + («7)ij] + ^ 

k 

Bk = pqm + pqhk = pq(p + ^k) + eijk, 

i j 

G = pqrm ^ pqr^i + T 


i j k 


where Ai is the total yield for the qr plots using the fth A treatment, C; for 
the pr plots using the jih. C treatment, (AQij for the r plots using the fth 
A treatment and the jth. C treatment, Bk for the pq plots of the A:th block, 
and G for the total pqr plots. Hence the solutions are 



{ac^ij 


~ m, 

(AC).y 

r 



— m, 


Ai 

qr 


- - A- ni. 

pr 


It can be shown that the reductions in sum of squares due to the various 
treatment effects are: 


A . qq A _ » 

Ay 

i 

_ Gy_ 

/i .. DDii. — 

qr 


qr 

pqr 

1 - 

cy 

iQ 

G2 

C: SSC = J- - 

pr 

—’ ~ 

pr 

pqr 





2 




{AC)ii - - 

q p pq 


(AC): SS(AC) 


II 


pqr 


SSA - SSC, 


where A = G/p and C = (?/g. 

The analysis-of-variance table is 


Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

F(MS) 

Blocks 

r - 1 

SSB 

MSB 

0-2 -h ^ ^l/{r - 1) 
k 

A treat¬ 
ments 

p - 1 

SSA 

MSA 

+ qrY^al/(v - ^) 

i 

C treat¬ 
ments 

q - 1 

SSC 

MSC 

0-2 + pr '^y]/{q — 1) 

j 

{AC) 

iv - - b) 

SS(AC) 

MS(AC) 

{^yrij/ip - Dig - 1) 

Error 

(p$ - l)(r - 1) 

SSE 


i 3 

(^2 


The F ratios, MSA/s“, AISC/s^, and MS(AC)/s^ can be used to test for the 
existence of real main effects and interactions. 

If the AC interaction were significantly greater than zero, the experi¬ 
menter probably would want to test the A treatments for each C treat¬ 
ment and the C treatments for each A. treatment. For each C treatment 
there would be (p — 1) degrees of freedom to test the A treatments; and 
for each A treatment, there would be (^ — 1) degrees of freedom to test 
the C treatments. These two analyses are not orthogonal; hence, the 
significance levels may be somewhat upset, but the experimenter should 
make them, perhaps using a — .01 or lower significance levels. When p 
and q are rather large, one would expect several significant F’s by chance, 
because of the large number of comparisons. 

The extension of these results to more than two sets of treatments is 
simple, the only complication being the greater variety of interactions. 
For example if there are three sets of treatments (A, C, and D), we have 
the following interactions: 

Two-factor: AC, AD, CD, 

Three-factor: ACD, 


i 
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The two-factor interactions are handled as above, while the three-factor 
interaction is handled as the remainder after accounting for the main 
effects and two-factor interactions. A significant three-factor interaction 
is more difficult to interpret. Many experiments are planned with 
r = 1, but with the three-factor and higher-factor interactions^ used 
to estimate the error mean square. We do not have the space to discuss 
these wider uses of the factorial design in this book, but they are discussed 
in detail in references 1 and 3. 

Example 20.1. An experiment using 3 varieties of sugar cane (V) and 
3 levels of nitrogen (N) was conducted in Hawaii in 1942, with 4 replica¬ 
tions. The nitrogen levels were, respectively, 150, 210, and 270 pounds 
per acre. The yields in tons of cane pei acre were. 


Blocks 

ViNi ViN~i ViNz V<iNi V<iNz VzNx VzN2 VzNz 

Total 

1 

70 5 67.3 79.9 58.6 64.3 64.4 65.8 64.1 56.3 

591.2 

2 

67 5 75.9 72.8 65.2 48.3 67,3 68.3 64.8 54.7 

584.8 

3 

63.9 72.2 64.8 70.2 74.0 78.0 72.7 70.9 66.2 

632.9 

4 

64.2 60.5 86.3 51.8 63.6 72.0 67.6 58.3 54.4 

578.7 

Total 

266.1 27,').!) 303.8 245.8 250.2 281.7 274.4 258.1 231.6 

2,387.6 


It is usually best to arrange these results in a 3 X 3 table of means as 


follows: 



Fi 

F2 

Vz 

Mean 

Ah 

66.52 

61 .45 

68.60 

65.52 

Ah 

68.98 

62.55 

64.52 

65.35 

Nz 

75.95 

70.42 

57.90 

68.09 

Mean 

70.48 

64.81 

63.67 

66.32 


C = (2,387.6)2/36 = 158,350.94. 


The analysis of variance is: 


Source of 

Degrees of 

Sum of 

Mean 

variation 

freedom 

squares 

square 

Blocks 

3 

200.68 

66.89 

Varieties 

2 

319.37 

159.69* 

Nitrogen 

2 

56.54 

28.27 

V XN 

4 

559.79 

139.95* 

Error 

24 

1,053.78 

43.91 


* Significant at the a — .05 level. 
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There were significant differences among the average varietal effects, 
but not among the average nitrogen effects. However, a significant 
interaction indicates that the nitrogens have different effects for each 
variety, possibly in opposite directions. In this case. Ns is best for Vi 
and V 2 , while A^i is best for F 3 . The standard error for the difference 
betweenany two of the nine treatment means is'\/2(43.91)/4 = 4.68. For 
Vi,Ns — iVi ( = 9.43) is almost significant and for F 3 , iVi — Ns {— 10.70) 

is significant, at the 5 per cent proba¬ 
bility level. These results are displayed 
in Fig. 20.1. These lines show that Vi 
is better than V 2 and that nitrogen 
affects both in the same way, with a 
decided jump in yield for the third level. 
However, nitrogen had an adverse effect 
on the third variety. A closer examina¬ 
tion of the ex*periment disclosed that F 3 
with a large amount of nitrogen matured 
before the other two varieties, but all 
varieties were harvested at the same 
time. Hence some of the cane for F 3 
with N 2 and Ns had rotted, resulting in 
decreased yields for the high nitrogen 
treatments. 

20,3. An Alternative Analysis for a 2 X 2 Factorial Experiment. An 

alternative computing process for the treatment sums of squares makes 
use of the orthogonal linear forms introduced in Chap. 6 . Since there are 
three treatment degrees of freedom, we can set up three orthogonal forms. 
These three orthogonal comparisons plus G are: 



Original treatment totals 

F(L2) 


(AC) 12 

(AC)21 

(AC) 22 


+ 

+ 

+ 


16 rV^ 

Li — A 1 

_ 

_ 

+ 

+ S 

4 r^{a 2 — cxi)^ -f- 4 rcr^ 

fia = C 

— 

+ 

_ 

+ 

4 r ^(72 “ 71)^ + 4 r(T^ 

L4 = {AC) 

+ 

— 

— 


r 2 j 2 _j- 4,.^2 


where I = [( 0 : 7)11 — (07)12 (07)21 + (07)22]. The computing pro¬ 

cedure for L 2 is simply to subtract the total yield of the first A treatment 
from that of the second A treatment, and similarly for the other L^s. 
From E{L‘^ we see that if we set up the null hypothesis Hq: oi = 02 , 
£'(Z/|) = 4ro-^, and if we divide EiL'^ by 4r, we have an unbiased estimate 



Fig. 20.1. Effects of different ni¬ 
trogen treatments on the yield of 
three varieties of sugar cane. 









of 0 -^. Hence, to test Ho, we need only compute the ratio 

4:rs^^ 

with 1 and 3(r -- 1) degrees of freedom. And we see that if ai ^ the 
expected value of the numerator of F is greater than that of the denomi¬ 
nator; hence, the one-tailed F should be used. To test Ho'. yi = 72 , we 
use F = L|/ 4 r 5 “. It is easily seen that if we use F = Ll/4:rs^, we are 
testing whether [(a 7 )ii — ( 0 : 7 ) 12 ] is different from [( 0 : 7)21 — ( 0 : 7 ) 22 ]. In 
other words, is the difference between the two C treatments the same 
using the first A treatment as using the second? Or, conversely, we can 
use [( 0 : 7)11 — ( 0 : 7 ) 21 ] versus [( 0 : 7)12 -- ( 0 : 7 ) 22 ]. Since this is exactly what 
we wanted to test by our interaction effect, it seems reasonable to use Li. 

Example 20.2. An experiment was run to determine the effectiveness 
of four different fertilizers on the total yield of oranges over a 12 -year 
period (1928 to 1939), at the Citrus Experiment Station, Riverside, Calif. 
The four fertilizers were nitrogen (N), nitrogen + phosphate (NP), 
nitrogen + potash (NK), and nitrogen + phosphate + potash (NPK). 
Four blocks were used, giving a total of 16 plots. The yields in pounds of 
oranges per tree per plot were as follows: 


Treatment 

Block 

Total 

Average 

1 

2 

3 

4 

N 

1,290 

1,531 

1 ,469 

1,631 

5,921 

1,480 

NP 

1,085 

1,348 

1 ,555 

1,328 

5,316 

1,329 

NK 

1,479 

1,484 

1,556 

1,759 

6,278 

1,570 

NPK 

1,293 

1,538 

1,561 

1,639 

6,031 

1,508 

Total 

5,147 

5,901 

6,141 

6,357 

23,546 

1,472 

.i i! 


The following orthogonal comparisons can be made: 


Total yield 

N 

5,921 

Treatment 

NP NK 

5,316 0,278 

NPK 

6,031 

L 

LV16 

Li =G 

+ 

+ 

+ 

+ 

23,546 

34,650,882 

U^P 

— 



+ 

-852 

45,369 

La = K 

— 

— 


+ 

1,072 

71,824 

L 4 = PK 

+ 

— 

— 

+ 

358 

8,010 

" , , ■li = . , 


125,203 


Treatments 





JUJ2jJXiO± XXLJl 


The total sum of squares is 

Sy^ = SY‘^ - (9V16 = 35,067,370 - 34,650,882 - 416,488. 
The block sum of squares is 

SSB = 34,859,185 - 34,650,882 = 208,303. 

Hence the error sum of squares is 

SSE = - SST - SSB = 82,982. 

The analysis-of-variance table is as follows: 


Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Blocks 

3 

208,303 

69,434 

P 

1 

45,369 

45,369 

K 

1 

71,824 

71,824* 

PK 

1 

8,010 

8,010 

Error 

9 

82,982 

9,220 


* Significant at the 5 per cent probability level. 

Since the 5 per cent F value for 1 and 9 degrees of freedom is 5.12, we 
see that P is not quite significant, while K is significant. There is no 

indication of a real PK interac¬ 
tion. It would appear that added 
potash definitely increased the 
yield and that added phosphate 
possibly decreased the yield of 
oranges. In order to determine 
if there was a significant profit 
from the use of the added fertilizer, 
the cost of the fertilizer should be 
compared with the added returns 
from the oranges. These results 
are displayed graphically in Fig. 
20.2. There is some evidence of 
a downward slope, indicating the 
detrimental effect of phosphate. 
Since the two lines are nearly 
parallel, there is no indication of 
an interaction between phosphate and potash. 

20.4. The Alternative Analysis of a. p X q Factorial. The general 
p X q factorial experiment can also be analyzed by the single-degree-of- 



Level of Phosphate 


Fig. 20.2. Effects of different amounts of 
potash and phosphate on the yield of 
oranges. 
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freedom approach of Sec. 20.3. Often the representatives of one or both 
sets of treatments are simply levels of that treatment. For example, the 
treatments might be three levels of a nitrogen fertilizer (A) and three 
levels of a phosphate fertilizer (C), as in Exercise 20.4. An example of 
three fertilizer levels and two varieties is presented in Example 20.3. 
The experimenter should decide before the experiment is performed what 
comparisons should be made. It is not necessary to restrict oneself to 
orthogonal comparisons, except that most pertinent comparisons can be 
set up as one of a set of orthogonal comparisons. The significance proba¬ 
bilities presumably are less disturbed if each comparison is orthogonal to 
the others. Snedecor^ (Chap. 15) discusses these single comparisons. 

A factorial experiment with more than two sets of treatments can also 
be analyzed by the single-degree-of-freedom approach. 

Matrix algebra techniques can be used to prove that the total treatment 
sum of squares obtained by the single-degree-of-freedom approach is the 
same as the treatment sum of squares of Chap. 18. 

If there are t treatment combinations and we set up t orthogonal com¬ 
parisons (such as the 4 L’s in Sec. 20.3), 


t I 



where — L;^/[coefficient of in E{L^)] and Ti is the total yield for the 
fth treatment combination (based on r plots). We consider that there 
are t orthogonal forms 


t 



— 1 , 2 , . . . j ty 


where ^ ah'iahi 

i=l 

LJ, we find that 


Hence 


|0 for h 7 ^ h\ 
|l for = /i' 


If we solve for the Ti in terms of the 


Ti = 


t 

^ ahiL^. 
h^i 



i i h 


11 + 


h i 


h' i h — \ 





since (j{TiTi') = 


0 for i 7 ^ i' 
ro-^ for i — 


J and 2 < = 1- 


Example 20.3. An experiment on a small grain was conducted with 
two varieties {vi and V 2 ), three fertilizer levels (/i,/2, and/3), and six blocks. 
The yields in pounds per plot were: 


F 

V 




Bloc 

k 




Linear 

forms 




1 

! 2 


4 

5 

1 

6 

Total 

F 


V 


FqV 

1 

1 

161 

166 

113 

103 

132 

180 

855 

_ 

+ 

_ 

+ 

_ 

1 

2 

192 

253 

208 

171 

196 

198 

1,218 

— 

+ 

+ 

— 

+ 

2 

1 

145 

231 

131 

158 

176 

216 

1,057 

0 

-2 

_ 

0 

2 

2 

2 

232 

231 

190 

171 

242 

238 

1,304 

0 

-2 

+ 

0 

-2 

3 

1 

172 

204 

104 

‘ 135 

178 

175 

968 

+ 

+ ■ 

— 

— 

— 

3 

2 

227 

214 

144 

146 

186 

230 

1,147 

+ 

+ 

+ 

+ 


Total 

TTi29 

1,299 

890 


1,110 

T,W 

6,549 

42 

-534 

i 

789 

i 

-184 

"48 


In this case we need two linear forms to represent the fertilizer effects. 
In general the experimenter wants to know whether there is a consistent 
linear trend, as indicated by Fi, and whether there is a significant depar¬ 
ture from linearity—in this case indicated by the quadratic component, 
F^. Fi measures the change from the lowest to the highest level of ferti¬ 
lizer, while Fq compares the sum of the yields at these two extreme levels 
against twice the middle jdeld. If the response of yield to fertilizer is 
linear, Fq will not be significantly different from zero. The two inter¬ 
action forms are simply the respective products of Fi and Fq with V. 
(FiV) represents the failure of the linear trend to be the same for the two 
varieties. That is, (FiV) = [(^T)32 - {FV)i2] - [(FV)n - (FV)ul A 
similar statement can be made for (FqV). 
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The expected values of the squares of the linear forms are: 


Lirica] 

^ form 

E{U) 

LV(coeff. 0 - 2 ) 

Fi 

- 42 

W4(/3 - /,)“ + 24<r2 

73.50 

F, 

- -534 

144(/i - 2/2 +A)2 + 72^2 

3,960.50 

V 

= 789 

324(^2 - v,y + 36cr2 

17,292.25 

(FiV) 

= -184 

1447f + 24 <t2 

1,410.67 

(F,V) 

= 48 

144/2 ^ 72cr2 

32.00 


SST = 22,768.92 


h = - fvn ~ fvn + fvn), and - 2/^22 + fvi 2 — fvzi + 2 fv 2 i - fvn). 

Wo have used / and d for the true fertilizer and variety effects to avoid the difficulty of 
identification with Greek letters. 

The analysis of variance is: 


Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Blocks 

5 

24,938.92 

4,987.78 

Fi 

1 

73.50 

73.50 

F, 

1 

3,960.50 

3,960.50** 

V 

1 

17,292.25 

17,292.25** 

(FiV) 

1 

1,410.67 

1,410.67 

(F,V) 

1 

32.00 

32.00 

Eri’or 

25 

9,896.91 

395.88 


** Significant at the 1 per cent probability level. 

Apparently the effect of ferti¬ 
lizer was definitely nonlinear, with 
the maximum yield near the 
middle rate of application and 
the yield at the highest rate little 
more than the yield at the lowest 
rate. There was a real difference 
between the two varieties, but 
the fertilizer effects were about 
the same for the two varieties as 
shown by the nonsignificant inter¬ 
action. These statements are 
confirmed graphically in Fig. 20.3, 
where we see that both lines are 
almost parallel and that each shows a decline for the third level of ferti¬ 
lizer. The estimated efficiency of the randomized complete-blocks design 
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Level of Fertilizer 

Fig. 20.3. Effects of different levels of fer¬ 
tilizer on the yields of two varieties of 
small grain. 
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as compared with a completely randomized design (neglecting the loss of 
error degrees of freedom) is 

_ 30(395.88) + 24,938.92 _ ^ 

35(395.88) “ 

indicating an expected increase of about 165 per cent in the error variance 
if the completely randomized design had been used. 

If the experimenter were interested only in the over-all fertilizer effects, 
he would compute 

ssr = + PI _ g = 4^034. 


SS(FF) = 


(FV)!, + 


-h (FV)L G 


, - SSV - SSF = 1,442.67. 


20.5. The Analysis of a p X ^ Factorial Experiment with Unequal 
Numbers in the Subclasses. The factorial experiments in the previous 
sections of this chapter have had a fixed number of replications (r) per 
treatment combination. If the number of replications is not the same 
from one subclass to another, the analysis is much more complicated. 
Snedecor^ shows that if the numbers are proportional to the main effect 
totals, the usual analysis can be carried out. That is, assume there are 
Uij entries for the subclass with the fth A treatment and the jth C treat¬ 
ment andn^. = ^ Uij, n.j = ^ riij, n — n^. If = Ui.n.j/n, the least 

j i i j 

squares equations show that the A, C, and AC effects are orthogonal and 
the usual analysis-of-variance methods apply. 

However, if the subclass numbers are not proportional, the A, C, and 
AC effects are confounded, requiring a complete least-squares solution of 
the type presented in Chap. 15. This solution is simplified if it can be 
assumed that there is no interaction, that is, the only effects are those for 
A and C. Several alternative computing procedures have been outlined 
by Yates,^ Snedecor and Cox,^ and Snedecor.^ The simplest procedure 
is to compute the mean yield for each of the pq subclasses and to run 
an analysis of variance of these means. An estimate of the error vari¬ 
ance to test for A, C, and AC effects is obtained from the sum of squares 
within subclasses with n — pq degrees of freedom. First compute 
= SS(within)/(n — pq). Since the first analysis is based on means, 
must be divided by the average number of entries per subclass. The 
average used is the harmonic mean, Uh, where 
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The appropriate error variance to use with the analysis of the means is 
g/2 ^ syiih. 

The analysis of variance is: 


Source of 
variation 

Degrees of 
freedom 

Mean 

square 

A 

p - 1 

MSA 

C 

q - 1 

MSC 

AC 

(p - 1 )(q - 1) 

MS(AC) 

Error 

n — pq 

s'2 = s'^/fih 


If some of the subclasses have no entries {ua — 0), this method of 
unweighted means cannot be used. Hence it has limited usefulness fqr the 
analysis of many social data, for which empty subclasses are common. 
However, if it can be used, the method has a minimum of computation 
and furnishes a short-cut procedure of testing for the existence of inter¬ 
actions. This is important information in applying other methods. 

If the Uij are almost proportional, an approximate method, called the 
method of proportional subclass numbers, is probably better than the 
method of unweighted means. However, the authors doubt that the 
gain is often worth the extra computing time required. (See references 
4 and 8 for the computing details.) 

A third alternative is the method of weighted squares of means, advanced 
by Yates^ and also described in references 4 and 8. This method provides 
exact tests of the main effects when interaction is present and for p = 2 
also provides an exact test of the interaction. 

The method of least squares furnishes an exact test for interactions 
regardless of the size of p and, if there is no interaction, an exact test of 
the main effects. First we assume no interaction and use the following 
model: 


Yijk = p -f cii -\- yj + ^ijk, = 1) 2, . . . , p] j — I, 2, . . . , q; 

/b — 1, 2, . . . , nij, 


where ai is the added effect of the tth A. treatment, yj the added effect of 
the jth C treatment, and k the /cth entry in the {i,f) subclass. In order 

P Q 




to have p = Y, we set ^ Ui.ai 

i = \ y=1 

squares (SSE') for this model is compared 
of squares (SSE) for the complete model 


= 0. The error sum of 
with the within subclass sum 


Yijk = M + + €ijk, 


where Tij represents the effect of the {i,j) subclass. The error variance 
will be s^ = SSE/(n — pq). 





The first model is used to determine m, ai, and Cj, using 
fij = n.ij(m 4 - cLi + Cj), 

where fij is the predicted total yield for the (i,j) subclass with n,:/ entries 
and ^ rii.ai = 0 = ^ n.fj. The least-squares equations can be written 


as follows: 


V y 

m: nm + ^ Ui.ai + ^ n.fj = G, 

i = }. 3=1 

Q 

a^: rii.m UiMi ) nuCi — Aij 
1 = 1 
V 

Cji n.jfn -f ^ Uijai + n.jCj = Cj, 


where Ai and Cj are respective treatment totals. Since 


^ 7ii.ai 


m = G/n = Y. In order to solve for the {a^j and {cjj, a method of 
matrix inversion or the forward solution of the abbreviated Doolittle 
method can be used (see Chap. 15). 

However, a short-cut procedure is available if the inverse matrix is 
not wanted. This short cut is the same as the method of adjusting for 
block effects in an incomplete-blocks design (sec Sec. 19.2) and is fre¬ 
quently called the sweep-out method. The Cj are adjusted for the Ui by 
multiplying the equation by {nij/ui.) and subtracting the sum of all 
these altered equations from the Cj equation. The resultant equation is 


+ 


-j- Ui-Cj 4“ 




where 


i f 

n-i - ^ </«;., I = i, 

i = l 
p 

- ^ nijUii/Ui., 
i=l 
P 

Cj — Cj — ^ TiijAi/rii.. 

1 = 1 

These q equations in the {cy} can then be handled by the forward solu¬ 
tion of the abbreviated Doolittle method to determine SSC (adjusted for 

A) and the {cy). Since ^ n.jCj == 0, the constants in the Cq row will be 
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zero. Hence the equation is often omitted and the effect Cq from the 
other equations. However, it might be advisable to retain Cq and the C'^ 
equation as a check on the computations. The shortest computing time 
is achieved when p > q',ii p < q, reverse the A and C treatments in the 
analysis. 

The analysis of variance can now be set up. 

SSA (unadjusted for C) = 

and SSC (adjusted for A) is computed from the y column of the abbrevi¬ 
ated Doolittle computations. Hence the total sum of squares for A and 
(7 is 

SS(v4 + C) = SSA(unadi) + SSC(adj), 

The total sum of squares for all pq subclasses is 



y y n- 


—; 
n 


where Tij is the total yield in the (i,j) subclass. Hence the residual, due 
to interaction, is 


SS(AC) 


^ y y _ y 4] 

Z/ Zy nij Z/ rii. 


SSC(adj). 


Also SSA (adjusted for C) can be computed by subtraction as follows: 
SSA(adj) = SS(A -f C) - SSC(unadj), 


where SSC(unadj) = 

y 

The analysis of variance is: 


Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

A(unadj) 

p -1 



C(adj) 

q - 1 

SSC(adj) 

MSG 

C(unadj) 

^ - 1 



^(adj) 

p - 1 

SSA(adj) 

MSA 

AC 

ip - l)(g - 1) 

SS(AC) 

MS(AC) 

Error 

n — pq 

SSE 

s2 
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Example 20.4. Some data were presented in Exercise 18.5 on the 
gain in Aveight of 149 rats for four successive generations, both male and 
female.^ In order to test for generation and sex effects and the (genera¬ 
tion X sex) interaction, an analysis for unequal subclass numbers is 
needed. The analysis of variance for the unweighted means is as follows: 


Source of 
variation 

Degrees of 
freedom 

Mean 

square 

Generations 

3 

42.08 

Sex 

1 

6,394.67 

GXS 

3 

58.22 

Error 

141 

26.27 


In computing the error mean square, fih = 15.576 and 
52 = 57,695/141 = 409.2. 


The only significant effect is sex, but the F value for interaction of 2.22 is 
slightly larger than the F.io value for 3 and 141 degrees of freedom. 
Hence one might be hesitant about concluding there was no interaction; 
apparently the main differences were sex differences with the possibility 
that these differences were not the same from one generation to another. 

In order to make the test for interaction more exact, we present the 
least-squares solution. The least-squares equations, omitting the inter¬ 
action constants, are (a,; for generation and Cj for sex effects): 


m: 

149m 

+ 

48ai 

+ 

40^2 

+ 

35a3 

+ 

26a4 

+ 

55ci 

4 - 

94c2 = 

19,537, 

ai: 

48m 

+ 

48^1 







+ 

21 ci 

4 - 

27c2 = 

6,673, 

a2\ 

40m 



+ 

40^2 





+ 

15ci 

-f 

2 oc 2 — 

5,274, 

dz- 

35m 





+ 

35a3 



+ 

12 ci 

4 - 

23c2 = 

4,364, 

(Z 4 : 

26m 







+ 

26a4 

+ 

7ci 

4 - 

19c2 = 

3,226, 

ci: 

55m 

+ 

21 ai 

+ 

15a2 

+ 

12a3 

+ 

7a4 

4 - 

55ci 


= 

9,203, 

C 2 : 

94m 

4- 

27ai 

+ 

2 5a 2 


23a3 

+ 

19a4 



4 - 

94c 2 = 

10,334. 


The restrictions are 

48ai + 40a2 + 35a3 + 26a4 = 0; 55ci -f 94c2 = 0. 
The a effects are swept out of the c equations as follows: 


n' 


12 


= 55 - 

= ^21 = 



(21)(27 ) 
48 


• + 

+ • 


or 

26 _ 

• ♦ + 


= 34.18860, 

(7) (19) ' ^ 
26 


-34.18860, 




1 


m 


m 


Mrs 


1 
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nU = 94 — 


(27)^ 

48 


+ 


+ 


(19)^ 


26 


C[ = 9,203 - 


C' = 10,334 - 


(21)(0,673) 
48 

(27) (6,673) 
“■ 48 


+ 

+ 


+ 

+ 


34.18860, 

(7)(3,226) ' 

26 

(19) (3,226) 
26 


1,941.045, 

= -1,941.045. 


Hen(;e the adjusted c equations are 

34.18860(ci - C 2 ) = 1,941.045, 

-34.18860(ci - C 2 ) = -1,941.045. 

If we neglect the equation and C 2 (or set c\ = Ci — C 2 ), we have 

c'l = 50.77463 

and 


SSC(adj) == c;(l,941,045) = 110,202.1. 


Also 

SSA(unadj) — 
and 

SSC(unadj) = 


(6,673)2 (3,226)2 

"48 ‘ ‘ + 26 “ 


(19,537)2 

149 


5,756.4, 


(9,203)2 (10,334)2 (19,537)2 

55 “94 149 


114,286.1. 


The total sum of squares for all eight subclasses was 

(3,716)2 (2,029)2 (19,537)2 _ 

2i ~ ‘ ‘ 19 149 


119,141.0. 


Hence the interaction sum of squares is 

119,141.0 - 110,202.1 - 5,756.4 = 3,182.5. 
The analysis of variance is: 


Source of variation 

Degrees of 
freedom 

Mean square 

Generations 

3 


Sex (ad j) 

1 

110,202 

Sex 

1 


Generation (adj) 

3 

557 

Sex X generation 

3 

1,061 

Error 

141 

409.2 






The F value to test for interaction is 

F = 1,061/409.2 = 2.59, 

with 3 and 141 degrees of freedom, which is not quite significant at the 
5 per cent probability level (F.os = 2.67). Note that the exact F is 
slightly larger than the F using the method of unweighted means. This 
is generally true, as the exact method is somewhat more powerful. 

If we conclude that there is no real interaction, we can use the error 
mean square (409.2) to test for the adjusted sex and generation effects, 
showing a highly significant sex difference but no real difference in gains 
for the four generations. If we want a more exact test of over-all sex 
effects without neglecting the interaction than the test using the 
unweighted means, interaction constants can be inserted in the model and 
the main effects adjusted for the interaction, or the method of weighted 
squares of means can be used. However, the average main effect has 
little meaning if interaction is present, because a real interaction indicates 
that the sex difference is not the same from one generation to another. 
Hence it would appear better to test for sex difference in each generation^ 
using either the pooled or a separate S“ for each generation. Pooling 
is justified only if the within subclass variance is constant from one sub¬ 
class to another. We have assumed the constancy of these variances, 
but the reasonableness of the assumption should be checked. 

It might be mentioned that a significant interaction is often evidence 
of a multiplicative relationship. In this case if we analyzed Z = log F, 
the assumption of additivity would be more nearly correct. However, 
it should be cautioned that if we analyze Z = log F, we must assume that 
the errors in F also multiply so they become additive for Z. 

20.6. The Use of Incomplete-blocks Designs for Factorial Experi¬ 
ments. As mentioned in Chap. 19, when all the treatments are not used 
in each block, the treatment and block effects become confounded. Since 
the factorial designs are constructed to obtain information on single 
comparisons, it seems reasonable to attempt to confound some of the 
less important comparisons and leave the more important comparisons 
free of block effects. For example, if we use a 2X2X2 factorial 
experiment and only 4 plots are available per block, it would be desirable 
to confound the ACD three-factor interaction and leave the other 
effects free of block effects. 

Designate the 8 treatment combinations as 

111 , 211 , 121 , 221 , 112 , 212 , 122 , 222 , 

where the first number refers to the A treatment (1 or 2), the second to the 
C treatment, and the third to the D treatment. If we put treatments 
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(111, 221, 212, 122) in one block and (211, 121, 112, 222) in the second 
block, the main effects and two factor interactions will be clear of block 
effects and ACD will be completely confounded with block effects. How¬ 
ever, if we repeat the experiment several times so that we have r repli¬ 
cations (2r blocks), a test can be made of ACD, using methods to be dis¬ 
cussed in Chap. 23 for split-plot designs. We shall defer a discussion of 
this point until then and simply indicate the analysis-of-variance degrees 
of freedom as follows: 


Source of 

Degrees of 

Variation 

Freedom 

Blocks 

2r - 1 

A 

1 

C 

1 

D 

1 

AC 

1 

AD 

1 

CD 

1 

Error 

6(r ~ 1) 


The general procedure of confounding in factorial experiments is 
extensively discussed in references 1 to 3. The reader is encouraged to 
read these references if he is interested in setting up factorial experiments 
with more treatment conbinations than plots per block. Some exercises 
are included at the end of the chapter. 

20.7. Construction of Experimental Designs. In all of the theoretical 
discussions in this and previous chapters, we have assumed that the design 
was known and have proceeded to set up an analysis for this design. 
The reader might be interested in knowing how to construct designs in the 
first place so that the analysis will be relatively simple. 

(a) There is no difficulty in setting up completely randomized or 
randomized complete blocks or complete Latin-square designs, except to 
remember that randomization is necessary. 

(h) In planning confounded factorial designs, the main principle is to 
restrict the confounding to high-order interactions, if this is possible. 
One principle must always be remembered with 2” designs: if two inter¬ 
actions are confounded, then a third effect is also confounded—this third 
effect is formed by casting out all like letters in the first two. For 
example, if ABC and ABD are confounded, CD will be also. Hence it is 
often necessary to adjust the confounding so as to protect main effects 
and two-factor interactions. For more than two levels, the principles of 
confounding are much more complicated. We shall present a few exam¬ 
ples of how to construct confounded factorials (see Yates® for the details 
of other examples). 


2m 
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Example 20.6. If a design is of the 2” character (n factors each at two 
levels), the construction of a confounded design is simplified by the use 
of the ( +, —) system. For example if we wish to use 2‘^ (= 16) treatments 
in 4 blocks of 4 treatments each, we are led to confound 3 degrees of 
freedom (the number of degrees of freedom between blocks in a repli¬ 
cation) . Let us designate the 4 factors SiS A, B, C, and D, with 1 standing 
for the low level and 2 the high level of a factor. We cannot confound 
on ABCD, because if we select any 3-factor interaction, we shall auto¬ 
matically confound a main effect by the above rule. Hence we consider 
confounding two of the four 3-factor interactions and one 2-factor inter¬ 
action, for example, ABC, ABD, and CD. We set up the (+, —) 
system for each of these three effects as in Table 20.1. We note that 





i 

m 


Table 20.1 


Treatment 

ABC ABD CD 

Block 

12 3 4 

1111 

- - + 

X 

2111 

+ + + 

X 

1211 

+ + + 

X 

2211 

- - + 

X 

1121 

+ - - 

X 

2121 

_ + _ 

X 

1221 

- + - 

X 

2221 

+ _ _ 

X 

1112 

- + - 

X 

2112 

+ - - 

X 

1212 

+ _ _ 

X 

2212 

- + - 

X 

1122 

+ + + 

X 

2122 

- - + 

X 

1222 

- - + 

X 

2222 

+ + + 

X 


there is a + for the treatment having the high level of all the factors in a 
particular interaction, — for one low level and the remainder high levels, 
etc. Also, CD = (ABC)(ABD). The treatments with the same combi¬ 
nation of (+, —) signs are assigned to the same block. This method can 
be used for all 2” designs. 

Example 20.6. Consider a 3^ design with the levels designated as 
1, 2, and 3. The 9 treatments can be arranged in a 3 X 3 table. 


8 


m 


m 


11 

12 

13 

55 

21 

22 

23 


31 

32 

33 
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The AB interaction (4 degrees of freedom) can be split into two com¬ 
ponents, each with 2 degrees of freedom. We compute 

/o = (11 + 22 -f 33), /i = (12 + 23 + 31), 1 2 = (13 + 21 + 32). 
/o = (11 4- 23 -f 32), /i - (12 + 21 + 33), J 2 = (13 + 22 + 31). 

The two components are 



If we wanted to use only 3 treatments per block, we could confound 
either the I or the J part of the AB interaction by planting the I or J 
combination of treatments in blocks. We note that these combinations 
do not confound the main effects, since all 3 levels of each factor are 
present in each combination. (See reference 3 for the rules when 
n > 2.) 

20.8. Summary. We have presented only an introduction to factorial 
designs in this chapter, but it is hoped that the reader has been able to see 
the usefulness of these designs. Yates^ and Cochran and Cox^ have an 
extensive amount of material on factorial designs, but even they do not 
present an exhaustive list; more needs to be done in developing con¬ 
founded factorial designs for the engineers and the physical scientists, 
who require three or more levels of each treatment and often more sets 
of treatments than are used by the natural scientists. Too little atten¬ 
tion has been paid to encouraging experimenters to use three or more 
levels of each treatment in an experiment in order to discover whether 
or not there is curvilinearity in the response curves. The 2” factorials 
have been used because of their simplicity, but they furnish little infor¬ 
mation on the response surface. 

Some work has been done oi\ fractional repUcation^'^^~'^^ in which only 
part of the treatment combinations are used. Fractional replication is 
needed in industrial experimentation, especially in destructive testing. 
Too little attention has been paid to the development of these designs, 
especially for mixed series, such as a 2 X 3 X 4 experiment. 

EXERCISES 

20.1. (a) In Sec. 20.3, show that 

E(Ll)/4.r = i^^(SSA), E(LI)/4r = E{8SC), E{L\)/4r = E[SS(AC)]. 

Hint: Remember that ai a 2 = 0, 71 -I -72 = 0, etc. 







(5) In Example 20.3; show that 


E 


Ff 

iJ. 4- _? 

24 ^ 72 


= £^(SSF). 


20.2. (a) In Example 20.1, what was the estimated efficiency of the 
randomized complete-blocks design as compared Avith a completely 
randomized design? Would you consider this to be significantly greater 
than 1? 

(6) Reanalyze the data, using the folloAAung orthogonal sets: Vi — V 2 ; 
F 1 + F 2 — 2 F 3 ; N 2 ~ Ni] 2Nz — {Ni N 2 )] interactions of the V 
and N comparisons. What does this analysis reA^eal? 

20.3. An experiment was designed to compare five varieties of cowpeas, 
at three different spacings, 4, 8, and 12 inches apart in row, with rows 
3 feet apart.^ For the data, see the accompanying table. 


Data op Exercise 20.3 

Yield of Cowpea Hay (Pounds per Morgen Plot) 


Variety X spacings 

Blocks 

Total 

Subtotal 

I 

II 

III 

IV 

New Era 

4^' 

56 

45 

43 

46 

190 



S" 

60 

50 

45 

48 

203 

616 


12" 

66 

57 

50 

50 

223 


34 C 361 

4" 

61 

58 

55 

56 

230 



S" 

60 

59 

54 

54 

227 

674 


12" 

59 

55 

51 

52 

217 


34 C 395 

4" 

63 

53 

49 

48 

213 



8" 

65 

56 

50 

50 

221 

665 


12" 

66 

58 

52 

55 

231 


34 C 402 

4" 

65 

61 

60 

63 

249 



8" 

60 

58 

56 

60 

234 

692 


12" 

53 

53 

48 

55 

209 


34 C 408 

4" 

60 

61 

50 

53 

224 



. 8" 

62 

68 

67 

60 

257 

773 


12" 

73 

77 

77 

65 

292 


Block totals 

929 

869 

807 

815 

3,420 




(а) Set up a table of means. 

(б) Set up the analysis of variance. 

(c) Derive the linear and quadratic components of the spacing effect 


ill III 
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and the interaction. Graph these results, using spacing as the X variate 
and drawing separate lines for each variety. 

(d) What is the standard error to compare any 2 of the 15 treatment 

means? 

(e) Discuss the results. 

20.4. A fertility test was made on the growth of grass on Philadelphia 
Flat soils in the Manti National Forest with three levels of nitrogen {N) 
and three levels of phosphate iP) with two samples of each treatment. 
For the data, see the accompanying table. Note that this is not a 



Data 

OF Exercise 

20.4 



Grams of grass 

Total 


Ao 

Ai 

N, 

Po 

18.7 

17.5 

36^ 

20.8 

20.5 

41.3 

22.3 

22.9 

45.2 

122.7 

Pi 

19.2 
21 .3 

ioTs 

18.8 

23.5 

4^ 

24.9 

24.2 

4 O 

131.9 

P2 

20.8 

20.5 

4iT3 

22.0 

24.0 

46.0 

25.6 

27.1 

^.7 

140.0 

Total 

~ 118.0 

129.6 

lif.'b 

"^94.6 

. 


randomized-blocks experiment but is analyzed as a completely randomized 
design of the type discussed in Sec. 18.2. 

(a) Set up the table of means, and make the analysis of variance. 

(5) Show that the only important effects are the linear for both N 

(c) Draw a graph similar to Fig. 20.2, and discuss the results. ^ 

20.6. Suppose you wish to set up an experiment to test the effectiveness 
of 2 levels of nitrogen, 2 levels of phosphate, and 2 levels of potash on the 
yield of potatoes and had enough land to plant 80 plots. 

(a) Show how you would set up this experiment. 

(5) Set up the analysis-of-variance table. 

(c) Indicate what kind of information can be obtained from such an 

experiment. . , i 

(d) If you wanted to know something about the maximum level ot the 

three fertilizers to use, what changes have to be made in planning another 
experiment ? 










(e) How would you take account of the cost of the fertilizers in making 
your recommendations to the farmer? 

20.6. A 6 X 6 Latin-square experiment was run at the North Carolina 
Agricultural Experiment Station to determine the effect of nitrogen and 
phosphate fertilizers on potato yields. The following treatments were 
used: 


Nitrogen 

Phosphate 

Low 

Medium 

High 


Low 

A 

B 

C 

High 

D 

E 

F 


The field arrangement and yields (pounds per plot) were: 


E 

B 

F 

A 

C 

D 

E.ow totals 

683 

527 

652 

390 

504 

416 

3,122 

B 

C 

D 

E 

F 

A 


489 

475 

415 

488 

571 

282 

2,720 


E 

C 

B 

D 

F 


384 

481 

483 

422 

334 

646 

2,750 

F 

D 

E 

C 

.1 

B 


620 

448 

505 

439 

323 

384 

2,719 

D 


B 

F 

E 

C 


452 

432 

411 

_ 

617 

594 

466 

2,972 


F 


D 

B 

E 


500 

505 

259 

366 

326 

420 

2,376 

Col. totals 3,078 

2,868 

2,725 

2,722 

2,652 

2,614 

16,659 


(a) Make a complete analysis of this experiment, indicating the treat¬ 
ment means and effects and linear and quadratic effects. Graph the 
results with nitrogen as the X variate and using separate lines for each 
phosphate. 

(b) What recommendations should be made if it costs $2 per plot extra 
for high instead of low nitrogen and |1 per plot extra for each jump in the 
amount of phosphate and potatoes sold for $.03 a pound? 

(c) Was the Latin square better than a randomized complete-blocks 
design? 

20.7. An experiment was conducted by C. H. Li to study the effect of 
electrolytic chromium plate as a source for the chromium impregnation of 
low-carbon steel wire.^ Eighteen treatments were considered, using all 
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combinations of three diffusion temperatures (2200°F, 2350°F, 2500°F), 
three diffusion times (4, 8, and 12 hours), and two degassing treatments 
[no degassing (0) and degassing (1)]. Each treatment was used on 4 
wires, giving a total of 72 wires used. The variable studied was average 
resistivity in microhms per cubic centimeter. The average resistivities 
are shown in the accompanying table. 


Data of Exercise 20.7 


Temperature 

2200° 

2350° 

2500° 

Degassing 0 

1 

0 1 

0 1 


4 hours 


18.1 

17.9 

22.1 

21.2 

22.9 

22.8 

18.9 

18.0 

20.2 

20.4 

24.0 

22.3 

18.6 

18.7 

21.3 

21.2 

23.0 

22.7 

19.1 

19.0 

22.6 

21 .2 

23.0 

23.3 

74.7 

73.6 

86.2 

84.0 

92.9 

91.1 



8 hours 



19.2 

19.2 

23.2 

22.7 

25.5 

26.9 

19.3 

19.0 

21 .8 

22.7 

26.6 

26.9 

20.7 

20.4 

22.9 

22.5 

25.9 

26.3 

20.4 

19.2 

22.3 

22.5 

26.8 

26.9 

79.6 

77.8 

90.2 

90.4 

104.8 

107.0 



12 hours 



20.0 

19.9 

23.9 

23.3 

27.0 

26.5 

20.2 

20.1 

23.6 

23.5 

26.2 

26.8 

20.1 

20.0 

23.2 

23.5 

25.9 

25.4 

20.5 

20.8 

23.7 

22.9 

26.9 

27.2 

80.8 

80.8 

94.4 

93.2 


105.9 


(a) Set up a table of means for the 18 treatment combinations and for 
each of the main effects. 

(Jo) Make an analysis of variance with single degrees of freedom 
- 524.2550). 

(c) Determine the standard error for the difference between any 2 of 
the 18 treatments. 

(d) What are your conclusions regarding these treatments? 

20.8. A randomized blocks experiment was set up to test the perform¬ 
ance of 16 treatment combinations in two different blocks, each block 
having 4 rows wdth 4 treatments per row. The original setup was a 
factorial experiment. It turned out that only 2 of the main treatments, 
designated as L and 0, were important. It also became evident that the 
blocks were badly placed, since there were marked differences in fertility 
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between the 4 rows, which ran across both blocks. In the analysis of the 
difference between the effects of L and 0, we are confronted with the 
difficulty that L and 0 did not each appear twice in each row of each block 
but sometimes L would appear 3 times and 0 once, or vice versa. The 
yields are presented below: 



Block I 

Block II 

Total 

Row 

L 0 

L 

0 

L 

0 

L + 0 

1 

84, 70, 81 66 

63, 97 

56, 64 

395 

186 

581 

2 

146, 171 148, 137 

189 

168, 158, 152 

506 

763 

1,269 

3 

247 179, 218, 228 

195, 189 

191, 179 

631 

995 

1,626 

4 

177, 153 123, 166 

145, 141, 130 

133 

746 

422 

1,168 

Total 

1,129 1,265 

1,149 

1,101 

2,278 

2,366 

4,644 


2,394 

2,250 



(a) Set up the least-squares equations to test -whether or not there was 
a real difference between L and 0 after adjusting for row effects and 
neglecting the other treatments. 

(5) Show that the following analysis of variance is obtained: 

Source of Variation Sum of Squares 

Blocks 648.00 

Rows 70,542.25 

Treatments (adj.) 1,313.21 

Error 8,006.04 

(c) Fill in the appropriate degrees of freedom in (6), and make a test of 
the treatment differences. 

(d) Is there any feature of the error which might cause you to doubt 
the validity of the analysis? 

20.9. Use the method presented in Sec. 15.3 (page 198) to solve for 
the values of m, {^4, and {cy! in Example 20.4. 

20.10. C. B. Ratchford^ investigated the differences in average man 
work units per farm for 114 nontractor farms in the coastal plains counties 
of North Carolina (1949). He studied three factors which might have 
influenced the number of man work units per farm: size of farm (small, 
medium, or large), type of farming (tobacco or general), and three types 
of rental arrangements. The total man work units were as shown in the 
accompanying table (number of farms in parentheses). The total sum 
of squares within classes was 1,201,641. 
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Data of Exercise 20.10 


Type of 

Size of 

Type of rental arrangement 

Total 

farming 

farm 

1 

2 

3 

Tobacco 

Small 

7,770.7(26) 

2,837.1 (9) 

5,410.7(19) 

16,018.5 

(54) 


Medium 

2,492.5 (8) 

1,239.6 (3) 

359.6 (1) 

4,091.7 

(12) 


T^arge 

1,654.6 (4) 

403.3 (1) 

(0) 1 

2,057.0 

(5) 


Total 

11,917.8(38) 

4,480.0(13) 

5,770.3(20) 

22,168.1 

(71) 

General 

Small 

2,081.1 (9) 

1,443.7 (4) 

2,252.3(13) 

5,777.1 

(26) 


Medium 

1,256.6 (4) 

1,179.0 (3) 

2,142.7 (5) 

4,578.3 

(12) 


Large 

680.8 (2) 

958.7 (2) 

361 .1 (1) 

2,000.6 

(5) 


Total j 

4,018.5(15) 

3,581.4 (9) 

4,756.1(19) 

12,:356.0 (43) 

Grand total 

15,936.3(53) 

8,061.4(22) 

10,526.4(39) 

34,524.1(114) 


{a) Derive the total sum of squares due to the main effects of size, type, 
and rental arrangement, not adjusted for interaction. Write the variables 
in this order in the matrix: rental arrangement, type, size. Sweep out the 
rental constants, and solve for the type and size constants: type (adjusted 
for rental) and size (adjusted for rental and t^^pe). 

{h) Derive the interaction sum of squares, and show that there was no 
significant interaction. What does this tell you about the separate two- 
factor and three-factor interactions? 

(c) Test the effect of size adjusted for rental arrangement and type of 
farming. 

(d) How would, you go about testing the other two adjusted main 
effects? 

20.11. Read an article by Anderson and Manning^'^ for a more complete 
discussion of matrix methods with unequal frequencies. 

20.12. Use the observation equations for each of the eight subclasses to 
set up the normal equations in Example 20.4 and Exercise 20.10. 

20.13. Show that the method of least squares produces exact tests for 
the main effects, even when interaction is present, if na — ni.n.j/n. 

20.14. Suppose you studied only small and medium-sized farms in 
Exercise 20.10. 

(a) Make an analysis by the method of unweighted means in this case. 

{h) Does this analysis give you any clue to the separate interactions of 
Exercise 20.10? 

(c) Why could you not use this method for all farms? 

20.15. Given the following data on the number of plants emerging in an 
experiment with two levels each of nitrogen (A), phosphate (P), and 
potash {K). Four treatments were used per block, with a total of 8 
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blocks (4 replications), and the NPK interaction was confounded. The 
stands were as follows: 


NPK 

la 

2a 

3a 

4a 

Total 

211 

31 

30 

33 

28 

122 

121 

25 

24 

30 

19 

98 

112 

21 

21 

30 

24 

96 

222 

66 

39 

41 

36 

182 

Total 

143 

114 

134 

107 

498 


NPK 

16 

26 

36 

46 

Total 

111 

11 

7 

19 

13 

50 

212 

33 

31 

36 

31 

131 

122 

29 

27 

31 

26 

113 

221 

43 

39 

36 

35 

153 

Total 

116 

104 

122 

105 

447 


(a) Show that the following anal 3 ^sis is correct: 


Soiirc,e of 
variation 

Degrees of 
freedom 

Mean 

square 

Blocks 

7 

50.10 

N 

1 

1,667.53 

P 

1 

675.28 

K 

1 

306.28 

NP 

1 

9.03 

NK 

1 

16.53 

PK 

1 

3.78 

Error 

18 

32.05 


(6) Test the significance of the main effects. What about the inter¬ 
actions ? 

(c) Show that the main effects and two-factor interactions are not 
confounded with block effects and that NPK is completely confounded. 

20.16. Two different cultivation methods were also used in the experi¬ 
ment presented in Example 20.3, giving a total of 12 treatments, but only 
6 treatments were planted per block. The treatment effects can be 
divided as follows, with the number of degrees of freedom per effect in 
parentheses: E(2), 7(1), 77(2), C(l), FC(2), 7C(1), 77C(2). The two 
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cultivation methods (ci and C 2 ) were assigned in the following order for 
each block (see the example for F and V ): 


F V 

Block 

1 

2 

3 

4 

5 

6 

1 

1 

2 

1 

1 

2 

1 

2 

1 

2 

1 

2 

2 

1 

2 

1 

2 

1 

1 

2 

2 

1 

1 

2 

2 

2 

2 

1 

1 

2 

2 

1 

3 

1 

1 

2 

1 

2 

2 

1 

3 

2 

2 

1 

2 

1 

1 

2 


We note that there are three complete replications of the 12 treatments. 
The total yields of three plots for each of the 12 treatments were: 


Treatment 

i 

112 

121 

122 

211 

212 

221 

222 

311 

312 

321 

322 

Total 

Yield 

: 411 

444 

561 

657 

479 

578 

659 

645 

451 

517 

546 

601 

6,549 


(a) Check that the above treatment totals are correct. 

(Jo) Write out 11 treatment effects, using Fi and Fq as before. Show 
that 8 of these are independent of block effects, and show that the sum of 
squares attributable to these 8 effects is 25,977.84. 

(c) Show that VC, FiVC, and FqVC are not independent of block 
effects; these effects are confounded with block effects. 

(d) Show that the sums of squares for the effects in (c) adjusted for 
blocks are 40.5, 154.1, and 129.6, respectively. 

(c) Compute the new error mean square, and make the necessary tests 
of significance. 

References Cited 

1. Cochran, W. G., and G. M. Cox, Experimental Designs, John Wiley & Sons, Inc., 

New York, 1950. 

2. Fisher, R. A., Design of Experiments, 4th ed., Oliver & Boyd, Ltd., Edinburgh & 

London, 1947, 

3. Yates, F., '‘The Design and Analysis of Factorial Experiments,” Imp. Bur. Soil 

ScA., Tech. Comm,un. 35 (1937). 

4. Snedecor, G, W., Statistical Methods, 4th ed., Collegiate Press, Inc., of Iowa 

State College, Ames, Iowa, 1946. 

5. Saunders, A. R., and A. A. Rayner, “Statistical Methods with Special Refer¬ 

ences to Field Experiments,” 3d ed., S. Africa Dept. Agr. Sci. Bull. 200 (1951). 

6. Li, C. it .. Steel Chromizing and Master Charts for Diffusion in Cylindrical Media, 

unpublished thesis. Library, Purdue University, 1951. 

7. Yates, F., “The Analysis of Multiple Classifications with Unequal Numbers in 

the Different Subclasses,” J. Am. Slat. Assoc., 29:51-66 (1934). 



296 


LEAST-SQUARES ANALYSIS 


8. Snedecoe, G. W., and G. M. Cox, '‘Disproportionate Subclass Numbers in 

Tables of Multiple Classification,” Iowa State Coll., Agr. Expt. Sta. Bull. 180 

(1935), 

9. Ratciifoed, C. B., Rental Arrangements in a Developing Economy, unpublished 

thesis, Duke University, Durham, N.C., 1951. 

10. Anderson, R. L., and H. Tj, Manning, “An Experimental Design Used to Esti¬ 

mate the Optimum Planting Date for Cotton,” Biometrics, 4:171 196 (1948). 

11. Finney, D. J., “The Fractional Replication of Factorial Arrangements,” Ann. 

Eugenics, 12: 291-301 (1945). 

12. Finney, D. J., “Recent Developments in the Design of Field Experiments. 

III. Fractional Replication,” J. Agr. Sci., 36: 184-191 (1946). 

13. Kempthorne, O., “A Simple Approach to Confounding and Fractional Replica¬ 

tion in Factorial Experiments,” Biometrika, 34:255-272 (1947). 

14. Davies, 0. L., and W. A. Hay, “The Construction and Use of Fractional Factorial 

Designs in Industrial Research,” Biometrics, 6:233-249 (1950). 

Other Reading 

Barnard, M. M., “An Enumeration of the Confounded Arrangements in the 2^ 
Factorial Designs,” J. Roy. Slat. Soc. Suppl., 3:195-202 (1936). 

Bliss, C. I., “Factorial Design and Covariance in the Biological Assay of Vitamin D,” 
J. Am. Stat. Assoc., 36:498-506 (1940), 

Bose, R. C., “Mathematical Theory of the Symmetrical Factorial Design,” Sankhyd, 
8:107-166 (1947). 

Cameron, J. M., and W. J. Youden, “The Selection of a Limited Number from Many 
Possible Conditioning Treatments for Alloys to Achieve Best Coverage and 
Statistical Evaluation,” Proc. ASTM, 60:951-960 (1950). 

Finney, D. J., “The Construction of Confounded Arrangements,” Empire J. Exptl. 
Agr., 16:107-112 (1947). 

Fisher, R. A., “The Theory of Confounding in Factorial Experiments in Relation 
to the Theory of Groups,” Ann, Eugenics, 11:341-353 (1942). 

Li, j. C. R., “Design and Statistical Analysis of Some Confounded Factorial 
Experiments,” Iowa State Coll. Agr. Expt. Sta. Bull. 333 (1944). 

Nair, K. R., “On a Method of Getting Confounded Arrangements in the General 
Symmetrical Type of Experiments,” Sankhya, 4:121-138 (1938). 

Nair, K. R., “Balanced Confounded Arrangements for the 5’^ Type of Experiments,” 
Sankhyd, 6:57-70 (1940). 

Yates, F., “The Analysis of Replicated Experiments When the Field Results Are 
Incomplete,” Empire J. Exptl. Agr., 1:129-142 (1933). 


CHAPTER 21 

THE ANALYSIS OF COVARIANCE 

21.1. Introduction. In Chaps. 18 to 20, we considered various types 
of experimental designs to estimate treatment effects and to test for differ¬ 
ences among these treatments. Frequently the experimenter wishes to 
make these estimates and tests on some dependent variate after adjusting 
for the effects of one or more fixed variates. For example, he might wish 
to test the effectiveness of various rations on the gains in weight of hogs 
after adjustments have been made for the initial weights of the hogs. 
Other fixed variates may be used in order to study the effect of the rations 
adjusted for the effects of such external factors as temperature of the pens 
or sunlight. An agronomic experiment might be improved by adjusting 
crop yield for weather and soil conditions or for unequal stand. In an 
educational experiment to test for differences between teaching methods, 
it might be advisable to adjust the results for the mental age or for some 
test score secured before the experiment started. Cocdiran has discussed 
the theoretical and practical aspects of the analysis of covariance in 
references 1 and 2. R. A. Fisher first introduced the method in reference 
3. 

21.2. The Use of Simple Covariance for a Randomized Complete-blocks 
Experiment 

(i) We shall not attempt to present the theory of covariance for all of 
the experimental designs of Chap. 18, since it is believed that the method 
can be adequately demonstrated with the randomized complete-blocks 
type of experiment. As in Sec. 18.3, we shall assume that the experi¬ 
menter has p treatments, each assigned at random to an individual plot in 
each of r blocks. The variate to be estimated is designated as F, and the 
fixed variate for which Y is to be adjusted is designated as X. The 
experimental model for the yield of the fth treatment in the jth block is 

Yij = /i + T* |(3* -j- ^Xij + eij = m t* + 5* -|- hxij + Bij, 

where Xij — Xij — X, and r* and are the treatment and block effects 
adjusted for the effect of the X’s. Since Sx = 0, it is obvious that 
hence, we shall use ju and its estimate, m, in the theory which 
follows. As usual 

Ir* = I 0r= 0 . 

i 3 

297 




298 


LEAST-SQUARES ANALYSIS 


The X’s are assumed to be fixed, and hence not influenced by the treat¬ 
ments. In case X has some effect on the yield (F) and the experimenter 
was not able to maintain the same value of X for all treatments, it is 
desirable to estimate the treatment effects after the yields have been 
adjusted for the effect of X. If the values of X are actually influenced 
by the treatments but can be measured without error, an analysis of 
covariance can still be run but interpretations are often quite difficult. 
As indicated in Chap. 15, if we estimate fx, r*, and (3 by the method of 
least squares, each of the first three estimates will be adjusted for the 
linear effect of X. It should be emphasized at this point that we are here 
considering only the linear effect of X; however, any other function of X 
could be used as the fixed variate, and we might consider several fixed 
variates in a multiple covariance. 

(ii). The least-squares equations for ni, /*, 5*, and 5, respectively, are 
rpm 

rm + rt* 

pm + 

2 + 

i 


= SY, 

+ hxi. = Ti, 

phf + hx.j = Bj, 

^ h^x.j -b bSx"^ = SxY = Sxy, 


where Xi. — Sxij = X^. — rX, x.j = Sxij = X.j — pX, and Ti and B, 

j i 

were defined in Sec. 18.3. In other words, Xi. and x.j are treatment and 

block sums for x. It is seen that ^ Xi. = ^ x.j = 0. 

i j 

The solutions to the least-squares equations are 



y i 


tf = {fi — F) — bxi. ti — hXi., 
b* - (Bj — F) — hx.j = bj — bx.j, 


where Exx and Exy are similar to SSE but applied to x^ and xy, Xj. = Xi./r, 
x.j = x.j/p, and similarly for T and B. 

We see that an adjusted treatment effect (t*) is estimated by subtract¬ 
ing an adjustment factor from the unadjusted effect (t). The adjustment 
factor, bxi.^ is simply the average change in F for a unit change in X 
multiplied by the difference between the treatment mean of X and 
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X(xi. = Xi. — X). Hence each treatment effect is adjusted to the aver¬ 
age effect if all treatments had operated with the mean value of X. 

It can be shown that 

^xy ~ S{Xij X.j Xi.^iyij Bj 

= B{xij - X.J - uy. 


(iii) The above estimates can be put in terms of the parameters and 
€,y} as follows: 

? = M + 6, 

ViJ = rf + /3* + ^Xij + e,y - 

Bj = M T ft* + 

Ti = jj, rf + pXi. + 


6 = |« = 0 ^ 

■CJ X X 


X.J — Xi.){i 


€.J 


- e) 


= ^ + 




x.^ 


Xi.)€i 


— /3 + €6, 


tr 

bf 


rf + li. — e — Xj.eb, 
+ l.j - € - X.jeb, 


where i = Se/rp, e.j — Seij/p, and e,. = Seij/r. 

i J 

Hence Y, h, tf, and ft* are unbiased estimates of p, (3, r*, and ft*, 
respectively. 

An unbiased estimate of an adjusted treatment mean is 


tf + F = rf + M T C’ ~ Xi.et. 


Similarly the estimate of the difference, 5*, between two treatment means 
is 

d* = 4* - 4* = (rf - rf) + (ej. - h.) - (xj. ~ Xi.)eb. 

The variance of the difference between two treatment means is 


cr2(d*) 


.2 1? i fei_'T£hZ" 
_r 


In computing this variance, we use Xj. — Xi. = Xj. — xi. . It might be 
noted that in computing this variance, we computed 0-^(5) = u'^/Exx- 
(iv) From least-squares theory, we know that the residual sum of 
squares is given by 


SSE* = Sj /2 - 2 ifTi - 2 bfBi - bSxy 

i i 

= [I T! - »-c] - ^ [I Bf - pc] - bE.,, 

i i 
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where C = {SYY/rp and = SY- — C. But this is the usual error 
sum of squares for the analysis of variance (Sec. 18.3), 

SSE = - SSB - SST, 

minus the reduction due to regression when x and y are adjusted for block 
and treatment effects. We see that 

= F + 4* + 6f + hXi! = ^ + rf + (3/ + + 1.1-1 

“1“ Xi, X.^. 

Hence the residual is 

Gij 1 ij 1 ij ^i' ^'3 ~i~ ^) ^•j}* 

The expected value of SSE* = Sefj is 

(r — l)(p — l)a^ — 0-2 = [(r — l)(p — 1) — 

Therefore s*^ = SSE*/[(r — l)(p — 1) — 1] is an unbiased estimate of 
0 - 2 .f This estimated variance can be used to set up confidence limits for 
differences between adjusted treatment means. 

The added reduction due to treatments is found by omitting the treat¬ 
ment constants from the model equation. The new residual variance is 

(SSE*)' = S'lf^ - SSB - 

where 

1/ ^xy S{Xij Bj) 

" ~ 

This is simply SSE'( = >S ^2 _ SSB) minus the reduction due to regression 
when X and y are adjusted for block effects onl}^ The added reduction 
due to treatments is 

SST* = (SSE*)' - SSE*. 

Also, 

;. = ? + hr + h'xij, 

4j — '’'i + ~ + «) — “ ^.y), 

where = S{xij — x.j)€ij/S{xrj — ^.y)l Hence the expected value of 
(SSE*)' is 

^ ^ + [r{v - 1) - IK* 

i 

Since SST* = (SSE*)' - SSE*, the expected value of SST* is 


J {rtY +iV- 


t s*2 is usually designated as sj.^. 





OUi 


Hence 

r'lirtY 

E(MST*) = 

Under the null hypothesis [rf =0), SST* is distributed as with 
(p — 1) degrees of freedom and independently of SSE*, which is dis¬ 
tributed as with (rp — r — p) degrees of freedom. Hence 

_ SSTV(p --1) _ mst* 

^ SSE*/(rp - r - p) ^ 

with (p — 1, rp — r — p) degrees of freedom, can be used to test the 
above null hypothesis. 

(v) The analysis of covariance table is: 


Source 

Degrees of 

Original sums 

Adjusted results 

of variation 

freedom 

xy 

d.f. SS MS 

Total 

Blocks 

Treatments 

rp — 1 
r - 1 
p - 1 

S'lP Sxy Sx^ 
SSB Bxy Bxx 
SST Txy Txx 


Error 

(Error)' 

(r - l)(p - 1) 
r(p - 1) 

SSE Ex.y Exx 
SSE' E:^ 7?;, 

(r - l)(p - 1) - 1 SSE* s*2 

r(p - 1) - 1 (SSE*)' 


Adjusted treatments ~ (error)' — (error) p — 1 SST* MST* 


In this analysis, 


(Error)' = error -j- treatments, 


Bxy ^ P^ BjX.j = 
j 


Txy = fiXi. = 

i 



{SY){SX) 
rp ' 

(^F)(^X) 
rp ^ 


and similarly B^x and Txx are block and treatment sums of squares for x. 






(vi) The analysis of covariance has tAvo main uses: 

(а) To reduce the error variance by eliminating the plot-to-plot 
variation attributable to fluctuations in the fixed variate, 

(б) To eliminate any bias in treatment comparisons caused 
by an uneven distribution of the fixed variate to the various 
treatments. 


The efficiency (I) of covariance in reducing the error variance is 
given by the ratio of the A^ariance of the difference betAveen tAvo unad¬ 
justed treatment means (2s^/r) to the average variance of the difference 
betAveen tAvo adjusted treatment means.t Finney'^ shoAA^s that the aver¬ 
age variance of the difference betAveen two adjusted means is 


2s*2 


= — 1 + 


(p - l)E, 


Hence 


.^2 


1 + 


7b 


ip - l)E. 


It is rather difficult to assess the effectiveness of covariance in elimi¬ 
nating the effect of X on the treatment means. It has sometimes 
been stated that one might test for treatment differences in X by use of 
Fx = (r — l)Txx/Exx- If Fx is not significant, the experimenter is told 
that he can attribute adjusted yield differences to the treatments; hoAA^- 
ever, if Fx is significant, he is adAused to be cautious, because adjusted 
yield differences might be attributed to differences in X. Actually the 
experimenter should consider Avhether treatment differences in X AA^erc 
inherent in the treatments (such as poor germination resulting in a low 
stand) or Avere the result of external circumstances. If the latter is the 
case, then covariance should be used to eliminate a bias in estimating 
treatment differences. But if the treatments actually produce differ¬ 
ences in X, the experimenter should take this into account in making 
recommendations. 

Example 21.1. Snedecor^ presents an example of the analysis of 
covariance of the yield of sugar beets (F) adjusted for stand (X). The 
yields in tons per acre and the stand in numbers of beets per plot are 
presented in Table 21.1. 

t Since there is a loss of only 1 degree of freedom in adjusting for X, we need not 
worry about this feature unless there are very few degrees of freedom for error. 
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Table 21.1 







Block 

. (i) 



Treatment sums 

Fertilizer 

(0 

Stand 

and 





and means 

applied 


yield 

1 

2 

3 

4 

5 

6 

Xi./Ti 

Xi./Ti 

None 

1 

Stand 

183 

176 

291 

254 

225 

249 

1,378 

229.7 



Yield 

2.45 

2.25 

4.38 

4.35 

3.42 

3.27 

20.12 

3.353 

Superphos¬ 

2 

Stand 

356 

300 

301 

271 

288 

258 

1,774 

295.7 

phate, P 


Yield 

6.71 

5.44 

4.92 

5.23 

6.74 

4.74 

33.78 

5.630 

Muriate of 

3 

Stand 

224 

258 

244 

217 

192 

236 

1,371 

228.5 

potash, K 


Yield 

3.22 

4.14 

2.32 

4.42 

3.28 

4.00 

21.38 

3.563 

PAR 

4 

Stand 

329 

283 

308 

326 

318 

318 

1,882 

313.7 



Yield 

6.34 

5.44 

5.22 

8.00 

6.96 

6.96 

38.92 

6.487 

P + sodium 

5 

Stand 

371 

354 

352 

331 

290 

410 

2,108 

351.3 

nitrate, N 


Yield 

6.48 

7.11 

5.88 

7.54 

6.61 

8.86 

42.48 

7.080 

KAN 

6 

Stand 

230 

221 

237 

193 

247 

250 

1,378 

229.7 



Yield 

3.70 

3.24 

2.82 

2.15 

5.19 

4.13 

21 .23 

3.538 

P A K A N 

7 

Stand 

322 

367 

400 

333 

314 

385 

2,121 

353.5 



Yield 

6.10 

7.68 

7.37 

7.83 

7.75 

7.39 

44.12 

7.353 

Block sums 


■ Y.^ 

Bi 

2,015 

35.00 

1,959 

35.30 

2,133 

32.91 

1,925 

39.52 

1 ,8^4 
39.95 

'^10 6 ' 
39.35 

12,012 
222.03 

286.0 
5,286 


The computations are as follows: 


8X2 ^ 3,587,590 
= 3,435,432 

4rZ 

~~ Sx^ = 152 , 1 ^'^ 


= 4,163.69 


SXY == 67,664.27 = 1,316.1479 


(SA)(SF) 


63,.500.58 


Sxy = 4,163.69 


^ = 1,173.7457 


(2,015) (35.00)+ • • • +(2,106) (.39.35) 


03,.500.58 


_ ~ (1,378)(20.12)+ ■ ■ ■ +(2,121)(44.12) gc 

= 4,163.69 + 116.56 - 3,,598.05 = 682.20. 

E,, = 28,665.10, 5 = f'- = .023799. 

^ XX 

SSE* = Sy'^ - SSB - S8T - hE,, 

= 142.4022 - 6.3134 - 112.8562 - 16.2357 = 6.9969. 



The results are presented in the following analysis-of-covariance table; 


Source of 

Degrees of 

Original sums 

Adjusted results 

variation 

freedom 

Sy^ 

Sxy 

Sx^ 

SS d.f, MS 

Total 

41 

142.4022 

4,163.69 

152,158.00 


Blocks 

5 

6.3134 

-116.56 

7,472.57 


Treatments 

6 

112.8562 

3,598.05 

116,020.33 


Error 

30 

23.2326 

682.20 

28,665.10 

6.9969 29 .2413 

(Error)' 

36 

136.0888 

4,280.25 

144,685.43 

9.4655 35 


Adjusted treatments = (error)' — error 2.4686 6 .4114 


F 


.4114 

.2413 


1.70, 


a > . 10 . 


Since F is not significant, Ave conclude that the treatments did not differ 
in their mean yields after adjusting for stand. This experiment falls in 
the uncertain class, because the stand is a result of the experiment and 
hence may be a treatment characteristic. For example, a given treat¬ 
ment may produce a good germination or a good start on the part of the 
plant so that its main contribution to yield may be in producing a good 
stand. If this is so, an adjustment for stand will cancel out the treatment 
effects. A test of T^x = 116,020.33 against = 28,665.10 gives 
Fx = 20.24, a highly significant value. This indicates that there Avere 
actually differences in stand from one treatment to another. Further¬ 
more AA^e note that a test of the treatment effects on yield not adjusted for 
stand gives 


112.8562/6 

23.2326/30 


24.29, 


also a highly significant value. Hence Ave conclude that there AA^ere 
definite treatment effects on yield but that these effects probably Avere 
arrived at indirectly through different stands, which then resulted in 
different yields. One further complication might be mentioned. Yields 
for loAV stand are often higher per plant than for high stand because of 
less competition for the available plant food and moisture. This appar¬ 
ently did not happen in our sugar beet example but in general should be 
considered. Also, it should be noted again that Ave considered only the 
linear effect of stand on yield; often the effect may be curvilinear. A 
final point is that although the treatments did affect the stand, the stand 
could be measured Avitliout error. 
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The unadjusted and adjusted treatment means are: 


- All 

6 7 


Unadjusted 3.353 5.630 3.563 6.487 7.080 3.538 7.353 5.286 

Adjusted 4.693 5.399 4.931 5.828 5.526 4.878 5.746 5.286 

The variance of an unadjusted mean difference would have been 

Hence / = .258/. 135 = 1.91. 

21.3. The Use of Simple Covariance for Other Experimental Designs. 

No attempt will be made in this book to outline in detail the computa¬ 
tional procedures for the use of simple covariance with other experimental 
designs. The computing procedure for other complete-blocks designs is 
exactly the same as for randomized blocks, namely: 

(i) Set up the usual analysis of variance given in Chap. 18, but include 
the sum of squares for x and the sum of cross products as well as the sum 
of squares for y for each line in the analysis. 

(ii) Compute the error line by subtraction for SSE, Exy^ and 

(hi) Compute SSE* = SSE - hE^y - SSE - E%,/Exx. MSE* == 

(iv) Compute the (error + treatment) = (error)' line by adding the 
treatment line to the error line. 

(v) (SSE*)' = SSE' - h'E'^y, 

(vi) Compute the adjusted sum of squares for treatments, 

SST* = (SSE*)' - SSE*. 

MST* = SST*/(p - 1), 




(vii) F = MST*/s*2. 






i5UD 
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(viii) tf + ? = fi - h(X^. - X). 


(ix) s^id^) = 


2s* 


1 + 


T 

J a 


(P - 


If the experimenter wishes to test a single degree of freedom of the 
(p — 1) treatment degrees of freedom, he must go through the same pro¬ 
cedure as outlined above with this one degree of freedom replacing the 
treatment line in the computations. If several of these single components 
are desired, an approximate method of computation has been suggested 
by Cochran and Cox.^ Compute 

S{y -hxy = Sy‘^ - 2hSxy + h^Sx^ 

for each component of the treatment sum of squares. This computation 
would give the correct value of SSE but overestimates each of the com¬ 
ponents of SST. However, the bias is generally small, and if the experi¬ 
ment is a complicated factorial, much labor is saved in testing the various 
main effects and interactions. 

See reference 6 for the use of covariance with lattice designs. 

21.4. The Use of Multiple Covariance. 

m 

Yij = + r * + /3/ + y (^kXkij + 

Snedecor® presents an example of a randomized blocks experiment on 
wheat yields in Great Britain with two fixed variates (height of shoots at 
ear emergence and number of plants at tillering). The treatments were 
6 different places, and the blocks were 3 different years. The computing 
procedure is as follows for m fixed variates: 

(i) Derive a table of sums of squares and cross-products for total, 
blocks, and treatments (and any other component in the analysis, such 
as columns in a Latin square). 

(ii) Compute the error line for each sum of squares and cross products 
by subtracting blocks, treatments, and any other component from the 
total. Use these error values to compute the {f?*! and Rl, the squared 
multiple correlation coefficient for the error line. 

(hi) Compute an (error + treatment) line and R^+f 
(iv) SSE* = SSE - 7f|SSE = SSE(1 - Rl). 

(SSE*)' = SSE'(1 - R^,). 
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(v) s*^ = SSE* (error di. - number of fixed variates). 

MST* = [(SSE*)' - SSE*]/(p - 1). 

F = MST*/s**. 

m 

(vi) tf + Y = Ti - 2 i'kiXu- - Xk), where hk is estimated in (ii). 

k = l 

(vii) If estimates of the variances of adjusted treatment differences are 
wanted, the b’s should be derived by a matrix-inversion method in order 
to obtain the variances and covariances of the b’s. 


sKtf - tf) = s 


*2 


- + 21 - XH.){XkH. - Xk;.) 


where Ckk' is the element in the inverse matrix of the error line. The 
average variance of adjusted treatment differences is 


r 


1 + 



Ck k' Tkk' 

J- 1J’ 


where Tkk' is the sum of cross products for {XkXk') in the treatment line 


of (i), 


Tkk' 


S{Xki -Xk){Xk>i - tk') 

_t____ 

r 


EXERCISES 

21.1. Derive these relationships for Exy and Exx’ 

Exy = S{Xij - x.j - Xi.){yii - Bj - fi), 

Exx “ S{Xij x.j 2/i.)^. 

21.2. (a) Derive Finney’s result for the average variance of the differ¬ 
ence between two adjusted treatment means, presented in Sec. 21.2. 
(h) Also, derive the result for multiple covariance given in Sec. 21.4. 

21.3. (a) Make a complete least-squares solution of a simple covariance 
analysis for a Latin-square design. 

(5) Illustrate the results in (a) by analyzing the following 5X5 Latin- 
square experiment on the yield in bags per acre of No. 1 Irish potatoes 
(F), adjusted for the percentage of No. I’s (A). The treatments were 
different amounts (pounds) of P2O5 per acre: a = 0, 5 = 40, c = 80, 
d = 120, e - 160. 




Columns 



Rows 1 

2 


3 

4 

5 





t Y 

X t Y 

X t 

^ Y 

X t Y 

X i r 

X 

Y 

X 

1 

1 a 134.0 

91 b 149.1 

88 c 

141.3 

87 d 161.3 

91 e 149.2 

91 

734.9 


448 

2 6 148.5 

90 d 148.5 

91 6 

199.3 

94 a 148.5 

90 c 152.7 

93 

797.5 


458 

3 c 145.2 

93 e 149.5 

95 a 

119.9 

90 b 149.2 

94 d 145.8 

90 

709.6 


462 

4 d 171.1 

91 c 169.0 

94 b 

144.9 

89 e 170.8 

95 a 130.4 

88 

786.2 


457 

5 e 175.8 

91 a 153.4 

94 d 

168.9 

92 c 167.6 

96 b 141.5 

93 

807.2 


466 

Total 774.6 

456709.5 1 

462 

774.3 

452 797.4 

466 719.6 

455 

i 

3,835.4 

2 

,291 


SY^ - 595,038.38, SXY = 351,944.8, SX^ = 210,085. 


Show that h — 2,0616 and = 127.18. 

(c) Analyze the linear component of the treatment mean square by 
the exact method and by the Cochran and Cox approximation presented 
in Sec. 21.3. Then show that the deviations from a linear trend are not 
significant. 

21.4. Consider a single degree of freedom to test for the adjusted effect 
of phosphate in the sugar beet example (Example 21.1): 

t = —T ^2 — ^3 "h ^4 ~ ^6 “b ^7. 

(a) Use the exact procedure to show that the effect of phosphate is 
significant even after adjusting for stand. 

(5) Show that the approximate procedure presented in Sec. 21.3 cannot 
be used in this case. 

21.6. The vitamin B 2 example used in Chap. 15 and Exercise 18.6 can 
also be analyzed by covariance. 

(a) Analyze the effects of the three soil moistures on the amount of 
vitamin B 2 after adjusting for the effect of Xi. 

(6) Analyze the effect of these three soil moistures after adjusting for 
Xi and A^3. 

(c) Can the fixed variates be considered independent of the soil 
moistures ? 

21.6. Johnson and Tsao^ analyzed the influence of sex, scholastic stand¬ 
ing, individual order, and grade on education development as measured 
by the Iowa Tests of Education Develoj)ment, There were 3 of each of the 
last 3 variables and the 2 sexes, giving a total of 54 treatment combina¬ 
tions. An initial score and the mental age of each student were deter¬ 
mined before the development tests were administered. Only one student 







was tested for each treatment combination. The final scores (F), 
initial scores (Xi), and mental age (X 2 ) were as shown in the accompany¬ 
ing table; 


Data of Exercise 21.6 


Sex 

Scholastic 

standing 

Individual 

order 

Grade 10 

Grade 11 

Grade 12 

Y 

Xr 

X 2 

Y 

Xi 

X 2 

Y 

Xi 

X 2 



1 

30 

28 

45 

26 

22 

62 

29 

25 

60 


Good 

2 

25 

22 

58 

26 

21 

57 

29 

24 

88 



3 

22 

19 

46 

24 

21 

65 

22 

19 

64 



1 

26 

22 

56 

24 

25 

54 

23 

21 

64 

Male 

Average 

2 

17 

14 

19 

23 

18 

55 

20 

17 

47 



3 

14 

14 

29 

15 

13 

24 

19 

17 

75 



1 

18 

18 

34 

18 

17 

40 

17 

16 

29 


Poor 

2 

17 

14 

17 

16 

13 

24 

15 

15 

38 



3 

12 

9 

19 

13 

12 

23 

14 

12 

28 



1 

21 

16 

44 

26 

22 

60 

33 

29 

94 


Good 

2 

21 

21 

44 

25 

22 

57 

29 

29 

89 



3 

19 

17 

6 

23 

19 

52 

25 

22 

78 



1 

20 

18 

38 

22 

19 

54 

23 

21 

50 

Female 

Average 

2 

18 

16 

27 

21 

19 

54 

18 

19 

57 



3 

14 

14 

18 

17 

16 

52 

17 

17 

43 



1 

14 

9 

18 

19 

17 

40 

15 

13 

36 


Poor 

2 

12 

7 

18 

15 

12 

28 

15 

14 

35 



3 

9 

7 

5 

13 

12 

48 

10 

9 

14 


SY = 1068, iSXi - 944, .SX 2 = 2,379, 

^F2 = 22,730, SXl - 17,926, >8X| - 127,639, 

SXiY = 20,116, SX 2 Y = 52,005, * 8 X 1 X 2 == 46,227. 


(a) Set up an analysis of simple covariance on F, using X 2 as the fixed 
variate with the main effects of sex, scholastic standing, order, and grade 
pulled out plus all two-factor interactions. Pool the three-factor and 
four-factor interactions as error (see Sec. 20.2). 

(5) Repeat (a), but with both dependent variates. 

21.7. Covariance was used to reduce the experimental error in a ran¬ 
domized blocks experiment on scuppernong grapes.^ Four blocks and 5 
magnesium treatments were used. Four plants were used per plot. The 
yields were in terms of pounds per plant. Two fixed variates were used 
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to estimate yielding ability prior to treatment: Xi = a score of 1 to 5 
given by the investigators as to the vigor and size of each of the eight 
arms on each plant; X 2 — diameter of each arm at a point 18 inches from 
the crown. For both X’s, the individual arm measurements were 
cumulated for each plant. The sum of squares and cross products were 
as follows: 



d.f. 


Sxiy 

Sx,y 

SxiX2 

Sxl 


Total 

19 

1,001.90 

306.25 

629.55 

308.64 

154.52 

963.14 

Blocks 

3 

223.50 

39.81 

-85.95 

9.07 

20.64 

77.19 

Treatrn ents 

4 

144.60 

64.54 

246.90 

113.11 

32.89 

436.32 


(а) Complete the analysis of covariance using only Xi. What was the 
efficiency of the coA^ariance analysis? 

(б) Repeat (a) using X 2 also. 

(c) The treatment means per plant Avere as folloAvs: 



I 

II 

III 

IV 

V 

Total 

Y 

13.79 

20.28 

17.77 

19.75 

14.33 

17.18 

Xi 

19.56 

22.88 

22.56 

22.56 

20.81 

21.67 

X 2 

64.44 

75.06 

72.56 

73.69 

64.31 

70.01 


DeriA^e the adjusted treatment means in (a) and (h) and the average 
standard error of the difference between two adjusted means. 

21.8. CoA'^ariance can also be used to make an analysis of variance when 
one or more of the plots are missing: this is necessary only when a double 
or higher restriction is made on the design, as Avith a randomized com¬ 
plete-blocks or a Latin-square design. The procedure is to set F = 0 
and X" = ~1 for the missing plot and X = 0 elseAAdiere. If there are 
several missing plots, multiple covariance must be used, with Xi = — 1 
for the first one, X 2 = — 1 for the second, etc. Use the method of covari¬ 
ance to make the analysis of Exercise 18.12. For the application of these 
methods to other designs, see reference 9. 

21.9. For the application of covariance techniques to disproportionate 
frequency problems, see reference 10. 

21.10. The analysis of covariance is often used to determine Avhether 
or not the same regression coefficient (/3) applies to all treatments. 
Snedecor® presents an example of this type of analysis. Use the vitamin 
B 2 data [Exercise 21.5(a)] to compute a separate b (for Xi) for each of the 
three soil moistures. Then set up the folloAving analysis, Avhere bi is the 
regression coefficient of Y onXifor each of the three treatments (t = 1,2,3). 
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Oil 


Source of variation 

Degrees 
of freedom 

SS 

i 

MS 

Deviations from average regression (6) 
Deviations from individual regressions 

23 

21 

SSE* 

(SSE)” 


Differences among individual regressions 

2 

SSE* - (SSE)” 

SSE* - (SSE)” 

2 


F = [SSE* - (SSR)"]/2s"^ 

(SSE)" = Sy^ - SST - [biSxiy + biSxiy + biSxiy], 

1 2 3 


where S applies to summation over treatment 

i 

21.11. In Example 21.1, neglect the block effects and apply the 
methods of Exercise 21.10 to determine a separate regression line for 
each treatment. Graph these data, and draw in each regression line 
and the over-all regression line. 
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CHAPTER 22 


VARIANCE COMPONENTS: ALL RANDOM COMPONENTS, 
EXCEPT THE MEAN 

22.1. Introduction. The regression models used in the previous 
chapters of Part II assumed that all variates were fixed except for a single 
random error term. Most experiments are designed so that several com¬ 
ponents are random instead of all except one being fixed. That is, the 
blocks in a randomized-blocks design may be assumed to be randomly 
selected from a large population of blocks, so that the block-to-block part 
of the analysis of variance is also a random component. And in sampling 
experiments and surveys and in genetics experiments, the model almost 
always postulates several random components; for example, (i) a sample 
of soils to determine the basic sources of variability, such as plot-to-plot 
differences, sample-to-sample differences in the same plot, and laboratory 
determination errors; (ii) a sample survey covering an entire region with a 
few counties selected from the large number of counties in the region, then 
a few areas selected from each sample county, and perhaps only one or 
two families from each selected area; (hi) a corn-breeding experiment with 
parents taken from a population produced by random mating, the progeny 
randomly assigned to plots, and a random selection of plants from each 
plot. 

The regression model using several sources of random variation can take 
on one of two forms: (a) every variate, except the general mean, is a ran¬ 
dom variate, or (5) there is a mixture of random and fixed variates. 
Eisenhart^ has presented the basic difference between (a) and the model 
used in the previous chapters of Part II and has indicated the importance 
of (b) without delving into the many theoretical difficulties involved in its 
use. Crump2 presents the basic theory for (a) and includes an extensive 
bibliography of the use of components of variance. Crump^ presents a 
more recent bibliography of articles on variance components and dis¬ 
cusses problems which merit further investigation. R. A. Fisher"* indi¬ 
cated the additive properties of variances in the first edition of his 
Statistical Methods for Research Workers. Yates and Zacopanay^ indi¬ 
cated the application of these methods to field sampling; among others, 
Cochran® extended them to enumerative surveys. Much of the recent 
development in the field of quantitative genetics has been built on a 
variance-components model, as discussed in references 7 and 8. 
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One of the authors of this book has presented the basic theory required 
for the mixed model (6) in an article on the analysis of price data.^ The 
reader is cautioned that this last article has many drawbacks from an 
economic point of view, but the difficulties encountered in the use of a 
mixed model are adequately outlined therein. This article drew much 
of its theory from articles by Danielsand Satterthwaite.^^ The mixed 
model is also required in the analysis of a series of experiments, such as 
those conducted over several years and at several places. Cochran and 
Cox^^ indicate some of the difficulties for this type of experimentation. 

We shall discuss general uses of model {a) in this chapter, mixed models 
in Chap. 23, and the use of variance components for balanced incomplete- 
blocks designs and lattice designs in Chap. 24 and shall give a general 
summary in Chap. 25. 

22.2. A Randomized-blocks Model. Let us assume that we have p 
treatments, each allocated to each of r blocks, and q samples taken from 
each of the pr plots, assuming that in each case the particular treatment, 
block, and sample are random samples from infinite (at least, very 
large) populations, t Hence there are prq samples. The model for the 
yield of the kth sample from the fth treatment on the jth block is 

YijJc = p + + /3j + {T(3)ij + eijk, 

where r*, (rjS),;, and eijk are all assumed to be NID with means 0 and 

variances, af, and respectively. Again it should be made clear 

that for purposes of estimation, the assumption of normality is not 
needed. This assumption is required, however, if the usual tests of 
significance and confidence limits are used. These variances are called 
variance components. Hence 

E{Yijk) = p, cr'KYijk) = (rf + (Tk + 

An experiment of this kind is set up to estimate the mean, p, and the 
variance of this estimate and/or to obtain estimates of the variance com¬ 
ponents themselves. The statistical geneticists, for example, are inter¬ 
ested mainly in the variance components. If the experiment was set up 
to obtain an estimate of p, it is understood that the estimate, 7, will 
deviate from p because only a sample of treatments, blocks, and plots (or 
samples from the subclasses) is used in the experiment. In other words, 
from one experiment to another, one could expect different values of the 

t Or we might consider r blocks with pq plots per block, each treatment being 
assigned at random to q plots per block. Although the two tyjDes of experiments are 
analyzed the same, the interpretations are different. The procedure given in the 
body of the text produces an estimate of sample-to-sample fluctuation in the same 
plot, while the one mentioned in this footnote produces an estimate of the plot-to-plot 
variation within a block. 
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jr4, and {e^A-} to appear. The results of the experiment can 

be used to estimate the variability of these random variables and hence 
to furnish an estimate of the variance of Y. This estimate of (t‘^{Y) can 
be used to construct confidence limits for ^ and to indicate how changes in 
future experimental plans might affect the precision of the estimate. The 
reader will note that this type of experimentation is fundamentally differ¬ 
ent from that described in the previous four chapters, even though the 
same analysis is used. In those chapters we were interested in estimating 
certain treatment effects under very limited experimental conditions and 
in making tests regarding the differences between these effects. Here we 
want to estimate a single mean with application to a wider area than that 
of the experimental plots used in the experiment. 

If the experimenter is interested only in the variance components, the 
problem is certainly much different from that of estimating a mean. In 
this case he wants confidence limits for the variance components, or func¬ 
tions of the variance components. We have stated previously that non¬ 
normality was probably not too serious for estimating the confidence 
limits for means, and if it were serious, some simple transformation would 
generally remedy the situation. However, the situation is not so simple 
for variance components. As we shall see later, the only estimates of the 
variances of these estimated variance components will be in terms of the 
components themselves, a satisfactory condition if normality holds. 
However, slight deviations from normality may require a knowledge of 
other data than that furnished by the aTialysis of variance (such as higher 
moments) in order to estimate these variances. As will be shown later, 
the problem of setting up confidence limits for variance components is far 
from solved. 

The estimates of the variance components will be designated as cr^ = S“ 
with the same subscripts. Obviously ^ Ik Since we have more than 
one component of variation in this model, the method of least squares 
cannot be used in the estimation process. Instead we must utilize a more 
general estimation procedure, such as the method of maximum likelihood. 
In order to present this material in its practical connection with the 
analysis-of-variance problems discussed previously, we shall assume that 
the data have been summarized in an analysis-of-variance table and then 
proceed to derive the expected values of the mean squares, using our 
variance-components model. Then the estimates of these variance com¬ 
ponents are found by equating the mean squares and their expectations, 
where each is replaced by its corresponding in the expectations. 
These analysis-of-variance estimates are the same as the maximum- 
likelihood estimates for an orthogonal model such as the randoihized 
complete-blocks model, if the errors are normally distributed. For non- 
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orthogonal models, such as a disproportionate frequency factorial, the 
maximum-likelihood equations are very difficult to solve (requiring 
iterative procedures in many cases), and the two sets of estimates would 
not be the same. We can derive unbiased estimates of the variance com¬ 
ponents from the analysis of variance, but there is no guarantee that these 
estimates are the most efficient which could be devised; in fact it is often 
doubtful that they are even moderately efficient for multiple-classification 
data with unequal numbers in the subclasses. 

The analysis of variance pertaining to the randomized complete-blocks 
model is: 


Source 
of variation 

Degrees of 
freedom 

1 

Sum of 
squares 

Mean 
square t 


Blocks 

r — 1 

SSB 

MSB = Vx 

<^e + 

Treatments 

p - 1 

SST 

MST = F 2 

o'c + 

T XB 

(r - l){p - 1 ) 

SS{TB) 

MS{TB) = V, 

4 + q< ^ 

Sampling error 

iq - l)rp 

SSE 

MSE = E 4 

ol = al 


t We shall use V to stand for a mean square. 


2 (Ri - G/rY J [(B, - pqY, - (G - rpqp)/rY 


SSB = 


_ 3 


_ 3 


pq 


pq 


^ (B,- - pqi^y 


3 = 1 


pq 


(G - rpqjxY 
rpq 


Bj - pqp = q^ Ti + pq(3i + 

i i i k 

G - rpqii = rq'^n + pqJ^lSj + q 


t } 


i j k 


Hence 
E{BSB) = r 


pq^<7^ + + pq^a% -f pq^l 

pq 


r^pq^af + rp^q^al -f- rpq’^a}^ -f rpqal 
rpq 

= (r - l)pqcTl + (r - l)g4 + (r - 
E(y,) = = pq^l + qaf, + <r| ^ <r?. 
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Since SST will be the same as SSB except for p and r and (j\ and cl 
interchanged, it is evident that 


^ ( 7 ^.; - rqix)' 


{G - rpqfiY 


■ '1 L J. 

Ti — rqiJL = rqn + q n 

j 3 3 * 

£:(SST) - (p - l){rqa‘l + qol + <t|), 

£^(^ 2 ) = rqaj + q^i + (r| = al. 


Also, we know that 


UliTB-,., 

S&{TB) = ^- 


r p rp 


,n.T,. Ti - rqn Bj - pqix G - rj^ 

{TB)ij - 9M-- p - + - rp 


[(TB)ii - qur 2 ~ X 


{G - rpqti)' 


where (TB)ij — qp, = q[Ti + ft + W),-,] + ^ ^nh- 


Hence 

E[S8{TB)] = rp[q{<rl + <r| + <^1) + + <l<^h + 

- r[qal + Pg<^h + Q'^fb + 

= qirp - p - r + l)<r,\ + {rp - p - r + i)<T^ 

= (r - l)(p - i)iq<ri + <rl), 

E{V,) = qal + <rf ^ cl 

Finally we know that the total sum of squares is 

Sy^ = S{Y - G/rpqY = S[F - rpqpY - {G - rpqpY/rpq. 
The expected value of this total is 

rq{p - l)<rf + pq{r - l)'^^ + q{rp - Bjcl + {rpq - l)<r|. 


It is easy to show that 


E{Vi) = <r| = cl 




The residual also can be expressed asf 


€'ijh 1 ijk ( ij/Q €ijk ^ ^ijk/Qy 

= (q- IK/q, E(SSE) = rpiq - 1)..^ 


I'. I ft 


p r rp 

E{Y) = fx and (7^(7) = ^ 

p r 


2 '> 
<^tb I 0 -; 

rp rpq 


(t\ + (t\ 


If a particular treatment mean is of importance, 


fi = — - = jU + r.j + - 


1 _ i_|_ 


0-2(7,) = ^ 

r r 


0-^ o-f + (p — l)cr| 


This assumes ri is now a fixed variate (see Chap. 23). 

The relative importance of various components can be assessed in 
o-2(F) or o-2(f'i) and compared with the costs of obtaining the sample to 
enable the experimenter better to plan his future experiments to estimate 
IJ, or Ti. For example, if it costs Ce for each sample in a plot and Cj> for 
each plot, then the total cost per treatment is 

Ci = r'q'Ce + r'C^, 

where r' plots are sampled and p' samples per plot. Suppose now that 
the total cost per treatment is fixed at C. We are then led to minimize 
0 - 2 ( 7 ,) subject to the restriction C, = C, in other words, to select r' and q' 
so as to minimize 

cr2(7,:) + \{Ci - (7). 


+ X(r'a) = 0 , 


The solutions are 




^l+4 + ^j + MgU + C,) -0, 

Ci = C, 


Ccicrl + (rfh) 


c 

q^Ce + C, 


t See the footnote on page 314 for a discussion of this residual, c may represent 
either sampling error or plot-to-plot error, depending on the randomization plan. 
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Of course q' and r' must be integers, and so the exact minimum in general 
will not be reached. 

In order to make the estimates of the variance components (af, af, 
and al) unbiased, we merely replace the components by their estimates 
(sf, s}, sf, and si, respectively) in the E{V) column and equate respective 
rows in the V and E(V) columns. Hence = F 4 , and 






(T 


2 

b — 


4 

si 


(1 

V2 - V, 

-, 

rq 

Fi - F3 

pq 


It can be shown that all of the quantities [Bj — 6 ^/r], [Ti — G/p], 
[F,j 7 c - {TB)ij/q], and [(TB)ij - T-i/r — Bj/p + G/rp] are orthogonal to 
one another. Hence the various mean squares are noncorrelated. Since 
any Vi is a sum of squared linear functions of NID variates, it is inde¬ 
pendently distributed as with degrees of freedom, where i stands 
for some particular line in the analysis-of-variance table with/i degrees of 
freedom in that line and a-| = £'(Ft). Also, 



_2F| 
fi + 2 


In order to use the method of maximum likelihood to estimate the 
variance components, some preliminary transformations arc needed. 
When the only random effect is e(r, /3, and r/3 are fixed effects), the Yijk are 

NID(/x + n + l3j + T(3ij; o-f). 

However, when some of the other effects are also random, the Yijk are 
correlated. For example when everything is random except p, the 
covariance between Yijk and Yijk' is 

0-? + cr| + crl. 

Hence the simultaneous distribution of the F’s is a multivariate normal 
distribution with nonzero covariance elements. This distribution can be 
simplified by splitting {Yijk — ?) into the sum of orthogonal parts as 
follows: 

F,,, - F = {qYijk - TBij)/q + {TBij - Ti/r - B,/p + G/rp)/q 

+ {pTi — G)/rpq -h {rBj — G)/rpq, 

t Some attention needs to be paid to the use of the fourth moment of the observa¬ 
tions in estimating o--. In some sampling problems, it may be possible to obtain 
several independent samples, determining Vi from each, and then estimating fr^{Vi) 
from the sample-to-sample fluctuations. 




where each quantity in parentheses is orthogonal to every other such 
quantity and has an expectation of zero.f 
Therefore 

S (Yij, -fy = S (Yij, - TB,j/qy + SiTB^j - T,/r - Bj/p + G/rp^/q 

ijk ijk ij 

+ Sm ~ G/pYIrq + B{B^ - G/rYlpq 

— (q — V)rpy4. + (r — l)(p — 1)^3 T (p — 1)^2 

+ (r - l)Fi. 

If the components are NID, these four sums of squares are independently 
distributed as x^W with /, degrees of freedom, where the successive (x^ are 

O’! = = (<^e + 

= W + q<^% + vq<^b)- 

The respective fi are 

/4 = rp{q - 1 ), /3 == (r - l)(p - 1 ), /2 = (p - 1 ), fi = {r - 1 ). 

Since the above sums of squares are independently distributed as xhh 
their joint frequency distribution is that of four distributions.f By 
transforming from the distributions of x^ to those of the F’s, it can be 
shown that the logarithmn of the likelihood function (L) is 

L = - I constant + S/i log o-f + 

Hence the ML estimates are 


c! = Vi, i = 1, 2, 3, 4. 
And by subtraction we find that 



Since the mean squares are orthogonal, the variance of an estimate of a 
variance component is simply a multiple of the sums of the variances of 
the mean squares used in estimating this variance component. That is, 
a variance component can be written in the form 



t Note that we are studying only deviations from the sample mean. This is 
equivalent to studying the likelihood of the four parts of the analysis of variance, 
neglecting any information which F furnishes about the variance components. 


r sxixjLxxiy \j\jivir KJiy Jhiy 1 to 




where ai = ± 1 . Hence 




2 


For example, 


X .7 


Vf 


+ 2 


L ^ 


= o'2 


,2 _ - Vs 

,-, 

pq 


.2r,2) ^ <^^(Fx) + <r^(F3) 

(pq)^ 


VI 


{pqY [r -\- 1 (r - l)(p - 1) + 2 


= <>'2 
— 65 . 


The problem of estimating confidence limits for o-| is still not completely 
solved. Some of the available procedures are indicated by Brossd^ 
These are: 

(i) AVhen r and p are large, the distribution of si approaches normality. 
In this case the (1 — ck) confidence limits are 

si - T^sl <al<sl+ TX, 

where (P(|T| > Ta) = a and T is a normal deviate. 

(ii) Satterthwaite^^ suggested that was approximately distributed 
as xVV/', with f degrees of freedom, where 


/' = (2a,F,)VZ(a2F,V/.-). 


Presumably it is best to use the nearest integer for /'. In example 10.2 
it was shown that the (1 — a) confidence limits for cr^ are 


/V 

xl 


< --~Y) 

xl 


where <P{x^ > xl) = (P(x' < xl) = a/2. For al 


n = (Fi - v,)yi{vi/fo + (F3V/3)]. 

Satterthwaite warned that when some of the a’s are negative, as in our 
case, caution must be exercised in the use of this x^ approximation. 

(hi) If = (Fi - Fy)/c and = E{s^) > 0, 


Fa = = F‘^„ = f\i + 

Vj L 


or 


F, 


[1 + (ccrVo-?)]' 


where is the variance component under consideration. 

Ijct F 2 and F 1 be tabular values of F, with fi and fj degrees of freedom, 
such that 

(P{F > F 2 ) = <?(F < Fi) = a/2. 
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Hence 


= (P F„ > ^ 2(^1 + = (P F,<F^{\+ ■ 


If we solve inside the brackets for cr^, we find that 


< ( C-' - 1 


«><.= > (C-“ 




Hence 


< (7^ < 


If we replace < 7 )/c by its estimate, Vj/c = s‘^/{Fq — 1), the confidence 
limits for cT" are 

(^v^i ,2 < ,2 < 


(iv) Using the fiducial probability concepts of R. A. Fisher,Bross” 
derived the following (1 — a) fiducial limits for rr-: 


(F 0 /F 2 ) - 1 
F'(F„/F2) - 1 


< (7® < 


^ (Fo/F,) - 1 1 ,, 
FUFo/7-'i) - ij ‘ ’ 


where F' = Fi for (/^ and oo) degrees of freedom. 

In (iii) and (iv), Fi = ^/F2{fjji), whereF 2 = FoifiJj) (see Exercise 7.29). 
In all cases, if the lower limit is negative, it is replaced by zero. This 
will occur when Fo is nonsignificant at the a/2 significance level (Fo < Fo). 

The problem of setting confidence limits for m is also complicated, 
because one does not know how many degrees of freedom to use for t. 
The Satterthwaite approximation^^ presumably can also be used here. 
For the given experiment. 


■KY) = (( 7 ? + ( 7 ? - 


al)/rpq. 


Hence 5^(7) = (Fi + F 2 — Vs)/rpq, with approximately f degrees of 
freedom, where 

_ (Fi + F2 - Fs)^ 

r — 1 p — 1 (r — l)(p — 1) 

Of course, if r and p are large, it is safe to use ^.05 = 2. 

The problem for a single treatment mean is slightly different, because 

(7Hf^) = [C7? + (p - DalVrpq. 





KUX!/ wiy J-^ 


Hence s"‘{fi) = [Vi + (p - l)Vz]/rpq with/" degrees of freedom, where 


f" = 


[Fi + {p- 1)F,P _ 

'■'IT , 

r-i^ (r - IKp - 1) 


(r - l)[Fi + {p- 1)F3]\ 
Vl + {p- 1)F| 


Example 22.1. Let us consider the example presented by Crump^ of a 
series of genetic experiments by J. W. Gowen on the number of eggs laid 
by each of 12 females, from 25 races of Drosophila melanogaster on^ the 
fourth day of laying, the whole experiment being carried out 4 times 
(r = 4, p = 25, <z = 12). The analysis of variance was as follows: 


Source of variation 

Degrees of 
freedom 

Mean square 

F(MS) 

Experiments (blocks) 

Races (treatments) 

T XB 

Females in subclasses 

3 

24 

72 

1100 

Vi = 46,650 

72 = 3,243 

73 = 459 

74 = 231 

<r( = trj 4- 12al 300(r? 

4=cl + 124 + i&y] 
4=^1 +124 
=^2 

- |g 


= 

S(b 


231, 

459 - 231 
12 


,, _ 2(231) 


1,102 


97, 


= 19, 


g2 = 3^^__jto9 ^ 


48 

2 _ 46,659 


300 


-«? = 154, 






_2 r ^i)j , 

144 L 1,102 


(459) 
74" J 


40. 



(45^* (3,243)=' 

"74 26 


(459) 


74 


! + 


= 354, 
«] = 9,676. 


The estimated variances of a general mean and a race mean with r' 
experiments, p' races, and q’ females per subclass are 


a\Y) 

aKfi) 


231 , 19 , 58 154 

-rj-i + -zrzi -T' '~r’ 


rpq rp 

231 , 19 . 154 

t' q' ' f' ' f' 


V 


_ ^ , 173 
r'(^ t' 


The most important component appears to be the variation from expen- 
ment to experiment (<r?). Hence, in order materially to cut down the 
variance of a race mean, it is necessary to increase the mimber of experi¬ 
ments (r), since increasing r decreases both parts ot s‘(i .)• 

Since ii is so important, let us consider the different methods of setting 
90 per cent confidence limits for it. 
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Method 

5 % lower limit 

5% upper limit 

10 % upper limit 

Normal (i) 

0 

316 


(ii) 

59 

1,313 

791 

F(iii) 

55 

1,331 

801 

Fiducial (iv) 

58 

1,325 

797 


Some of the data needed for the above results were: 

4 = 98.4, fi =3, Fo = 101.65, 

x ?05 = 7.81, x ?95 = .352, x.\o = .584, 

F.„5 = 2.74, F.55 = F.oo = N, 

- ii' - 5-T3- 

We computed the 10 per cent upper limit for the last three methods, 
assuming that the lower limit would be zero. Actually, when all the 
probability is put on the upper tail, the lower limit will be slightly nega¬ 
tive for (hi) and (iv), but we have postulated that > 0. When there 
are so few degrees of freedom for estimating a component such as al, the 
confidence limits should be quite wide. The normal approximation is 
definitely unsatisfactory in this case. The other three estimates are 
remarkedly close together. Bross^^ gives some examples where this is 
not the case. He criticizes the x^ method in one case of a nonsignificant 
Fo, for which methods (hi) and (iv) will give negative lower limits (assumed 
0 since > 0), whereas the x^ lower limit is greater than zero. It might 
be advisable, when Fo is nonsignificant, to put all of the probability on 
the upper tail. This procedure should be considered from a theoretical 
standpoint. 

For this particular sample, s^(Y) — 41.20, and 

f = _ (49,443)^ _ =3 4- 

■’ (46,659)" (3,243)" (459)" ’ 

3 24 72 

also, — 48.06, and 

^ 3(57,675)" ^ 

(46,659)" + (24) (459)" 

Hence the 95 per cent confidence limits would be 

Y - (2.3)(6.4) < M < T + (2.3) (6.4), 
fi - (2.1) (6.9) < Ti < fi + (2.1) (6.9). 
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22.3. Repeated Subsampling with Equal Numbers in the Subclasses. 

A sampling technique ordinarily used in sample survep and in many 
sampling experiments in the physical and biological sciences is that of 
repeated sub sampling, sometimes called nested sampling (see references 15 
and 16). We shall assume that there are four tiers in the universe: A, B 
in A, C in B, D in C. Obviously there may be more or less, but we shall 
illustrate the methods with four. Assume that the number of possible 
samples from A is large enough so that the sample represents only a small 
segment of the tier; hence, regardless of the sampling rate within the 
selected A units, the sampling rate for all B, C, and D units will be small. 
Suppose that a A units are selected, b B units from each of the A units, c 
C units from each B unit, and d D units from each C unit, the total number 
of samples being n = abed. 


Yijkm — P Oii A- "b yijk + ^ijkm, 


where f = 1, 2 


a; i = 1, 2 


b] h — 1, 2, . . . , c, w 1, 


d. ’ All of the effects, except u, are assumed NID(0,o-f), where 


(t! = (tI, (tI or al respectively. Hence 

and 




7 =: — 

abed 


ul . al 




hI_ -f -aL. 

abc abed 


The analysis of variance is as follows: 


Source of 
variation 

Degrees of 
freedom 

Mean 

square 

E(MS) 

A 

a — 1 

7i 

+ dal + cdal + bcdal 

B in A 

aib - 1) 


A = 

C in B 

ah{c — 1) 

Ta 

A = 

D in C 

abc(d — 1) 

74 


Total 

ohed — 1 




The sum of squares for A is computed like the block and treatment sums 
of squares in Sec. 22.2. The remaining sums are simply within sums of 
squares. Hence, 

I-'-’ 






abed 

II«- I-*’ 

SSB = 


ed 


bed 


etc., 
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where Ai 


'y' ^ ^ Yijkjnj J^ij Yijkm} etC. 




The expected values of 


3 k m k m 

the mean squares are computed as in Sec. 22.2. All of the theory presented 
in that section pertaining to the estimates of the variance components 
and their variances and confidence limits carries over here, except that 
each variance component is estimated from its mean square and the one 
just below. For example 


Vi - V2 

bed 


However, the confidence limits for }i are easier to compute, because 
S"(y) = Vi/n with (a — 1) degrees of freedom. 

Example 22.2. Suppose we had a sample of 40 townships with 4 areas 
per township and 2 farms from each area, giving a total of 320 farms, to 
estimate the average number of acres of corn per farm. The mean corn 
yield was 40 bushels per acre. The analysis of variance for the sample 
data was: 


Division 

Degrees of 
freedom 

Mean 

square 

F(MS) 

Townships 

39 

320 

2 

+ 2<r2 + 

Areas in townships 

120 

240 


+ 2cr^ 

Farms in areas 

160 

100 

4 


Total 

319 




The expectations of the mean squares are given on the right, where a} is 
the farm-to-farm variance component, crl the area component, and the 
township component. The estimates of these variance components arc 


s} = 100, 



sf = 


320 - 240 

8 


= 10. 


The estimated variance of the mean corn yield for h' townships, c' areas 
per township, and d' farms per area would be 

sHY) - 10 I ^0 I ^00 


For the given sample, s^(F) = 1.00. Hence the 95 per cent confidence 
limits for fx are 

40 - 2.0 < g < 40 + 2.0 

or 

38 < < 42. 
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,„bcli«... A«umelhereate«Auml.,eMl.«ia>"i““P“(4"- 

. „„i. h.». B u„i j,B 


Source of 
variation 


Degrees of 
freedom 


Coefficients of variance components 

in E(US) 


: > Eli"--'- ??-v -1 

B1.J i|b^. > tti-v-'iV''" 

cini. n».' 2 ‘- ‘ iii”'"'" 

ij ^ ^ ^ * 

DinC ^ 


A/m) - (l/n) ^ - 

2 “- 


OVJHiLrilM, and/if 


_ n /niitl - (l/«-.'i) . 

*■ n-E*' 


The sums of squares in the analysis of variance 

V Af 


'e computed as follows: 


--itxa-xxs 


A .y—i. comprtiM 

coefficients of the variance components^he ^P^eted . 

sduares This will be demonstrated m Example 22.d ^ 

t This section may be omitted without upsetting the continuity of the boo . 
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Example 22.3. Cochran® presents the analysis of a 1937 enumeration 
of commercial wheat fields in 6 districts of Great Britain. The sampling 
plan was to select a number of farms from each district, 1 or 2 fields per 
farm and 2 paths per field, each path having 6 bulked samples. In addi¬ 
tion, from the only farm in district III, 2 varieties were sampled in each 
of the 3 fields (2 paths per variety per field). The number of farms and 
fields per farm for each district were: 


District 

1 

No. farms 

1 

No. fields per farmf ] 

Totals 

Fields Paths 

I 

2 

2(2) 

4 

8 

II 

2 

2, 1 

3 

6 

mt 

1 

3 

3 

12 

IV 

9 

2(2), 1(7) 

11 

22 

V 

1 

2 

2 

4 

VI 

10 

2(3), 1(7) 

13 

26 

Total 

! 25 

36 

78 


t Numbers in parentheses refer to number of farms with this many fields; those 
without parentheses have one farm with this many fields. 
t Two varieties were sampled in each field. 

The analysis of variance, excluding the 3 degrees of freedom for 
varieties, is presented below in terms of hundredweights (112 pounds) 
per acre (based on the means of the six bulked samples).f 


Source of variation 

Degrees of 
freedom 

Mean 

square 

E{MS) 

Districts (7l) 

5 

10.28 

<rl = <rl + 2.34a^ + 4.90(r,^ + 11.96^" 

Farms (B) in A 

19 

6.59 

< 7.2 = + 2.00<r2 + 2.58(72 

Fields (C) in B 

11 

3.03 

< 7.2 = <72 + 2.36<r2 

Paths (D) in C 

39 

.825 

_2 „ 2 

0^4 — 


Set up a diagram like the following to help in computing the coefficients 
of the variance components in E{MS ): 

t Cochran presents his analysis in terms of the sum of four sets of 3 samples each, 
each set being a mean of the 3 samples in the set; hence his mean squares are eight 
times as large as those above. 
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No. 

units 

No. 

samples 



Distribution of the samplesf 



Total 

1 

n 

78 

Districts 

6 

rii 

8 

6 

12 

22 

4 

26 

Farms 

25 

riii 

4(2) 

4(1) 

2(1) 

12(1) 

4(2) 1 2(7) 

4(1) 

4(3) 

2(7) 

Fields 

30 

yii jk 

2(4) 

2(2) 

2(1) 

4(3) 

2(4) 1 2(7) 

2(2) 

2(6) 

2(7) 


t Numbers in parentheses are number of farms or fields haying this many sarnples. 


This diagram reads as follows for, say, the last district: there are 26 sam¬ 
ples from this district, 4 from each of 3 farms and 2 from each of 7 farms; 
there are 13 fields each with 2 samples. 

The expectations of the various sums of squares are handled as follows: 


Sum of squares 

-1 




u 

i 1,420 

rf 348 

y y y 

n 

n 78 

= 18.21 

n ^ 7S 

= 4.46 

n ”73 

= 2.31 

Y 

Hi 

i 

11 

y y ^ = 28.98 

rii 

i j 

yyy!is= 14.00 

La La ni 

i j k 

yvBi 

L 4 mj 

i j 

i j 

22-1 = 78 

i j 

y y y = 52.00 

La La La na 

yyy^^ 

A A A 

i i k 

78 

78 

78 

i* 


To illustrate the computing procedure, consider a few examples: 

<-'SXS 


(16)(2) (16)(1) + (4)(1) 144 (16)(2) + (4)(7) 

8 6 ■ 12 22 


! 1 


, 16 , (16)(3) + (4)(7) _ 
+ T +-26- 


«»sxx s 


(4)(18) , (4)(15) , (16)(3) 


52. 


It should be noted that, for unequal subclass numbers, the mean squares 
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are no longer distributed as simply but as sums like Xix? + X 2 X 2 + 

• • • , where X’s are functions of the variance components and the number 
of observations. Hence the confidence limits presented in Sec. 22.2 can¬ 
not be used here. The theory applicable to this problem is beyond this 
book. 


EXERCISES 

22.1. In Example 22.1, suppose that Ce — 10, Cp = 120, and C = 960. 
Solve for r' and q' to minimize s‘^(fi). What is s^(Ti) using these values 
of r' and q' ? Compare this result with that obtained in the actual experi¬ 
ment (r = 4, g = 12). 

22.2. An experiment was run on the average stem length (in inches) 
per plant of snapdragons. Seven soil types were used in each of 3 replica¬ 
tions, 18 plants being selected at random from each plot. The analysis 
of variance was as follows: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Replications 


39.04 

Soil types 


103.15 

Interaction 


19.74 

Sampling error 


507.30 


(a) Fill in the degrees of freedom, and compute the mean squares. 

(b) If these replications and soil types can be assumed to be randomly 
selected from large populations, what are the expectations of the mean 
squares ? 

(c) Compute the estimates of the variance components in (5). 

(d) The sample mean height was 32.56 inches per plant. What is the 
variance of this mean and the 95 per cent confidence limits for the true 
mean, under the assumptions of (5) ? 

(e) What is the estimated variance of a mean if p' soil types, r' replica¬ 
tions, and q' plants per plot were used? 

22.3. An experiment was conducted to estimate the genetic and environ¬ 
mental variances in corn,’^ using 192 progenies produced from 48 males 
with 192 females (4 females per male). The field layout was in 12 blocks 
of 32 plots each, each block having 4 male parents, each crossed with 4 
different females, and each cross duplicated. The yields were based on 
the pounds of grain for 10 guarded plants per plot (plants having another 
plant on each side of the row). An estimate of the plant-to-plant varia¬ 
bility was obtained from 23 of the plots (230 plants). The analysis of 
variance was as follows: 
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Source of variation 

Degrees of 
freedom 

Mean 

square 

£^(MS)t 

Blocks 

n 

.153 


Duplications in blocks 

12 

.063 


Males in blocks 

3G 

.167 

+ lO^rJ + 20«r^ + 80cr'^ 

Females in males in blocks 

144 

.069 

cr2 + 104 + 20aj 

Parents X duplications in blocks 

180 

.031 

^2 + lOcrJ 

Plants in plots 

207 

.0153 

0-2 

y - 


t 0-2 is the phiiit-to-plant variance, (t^ the plot-to-plot variance (parent X duplicate 
interaction), aj the added variance due to females, and the added variance due to 


males. 

(a) Derive the coefficients of the variance components in EiMS). 
lb) Compute estimates of these variance components. 

(c) How would you test the hypothesis that ai = erj? 

(d) Derive approximate confidence limits for oj and o-^. 

22 . 4 . Assume we have p treatments and r blocks with aibj samples for 
the fth treatment and the jih block, with all effects random except the 


mean. 

(a) Set up the analysis-of-variance table. 

(?;) Derive formulas for the expectations of the mean squares in terms 
of the ai and hj. 

(c) Suppose that aibj = Ci. How does this affect the results in (6) ? 
22 . 6 . Prove that 



(TB)ij - qp - 




— rqp _ Bj — pqp G — rpqp ^ 
r p rp _ 


2{Ti-rq^y 

i __ 

r 


^ {Bj - pqp)‘^ 

1 -+ 


V 


{G ~ rpqp)\ 
rp 


22 . 6 . (o) Prove that {Bj - G/r) and {Ti - (7/p) are not correlated. 

(!>) Prove that each is not correlated with [(PB);,' - »■ “ Bj/p + G/rp], 

22 . 7 . Show that B(F|) = (/. + 2)vl//.. 

22 . 8 . Show that the ML solution in Sec. 22.2 is correct. What happens 
if you consider S{Y — instead of iS(K — P)^? 

22 . 9 . (a) Given F,- - F,- = exV/f, where Fi = xM/k Vi = x]<^]/fi, 
ca^ = af — af, and each chi-s(iuare has as degrees of freedom its denomi¬ 
nator. Show by equating the variances of the two sides of this equation 


that 


f = 


{c<rk 


ki/fi) + 


Also, show that E(Vi — V,) = B(cxVV/0- 










(b) How does this result connect with Satterthwaite^s approximation 
for confidence limits on <7^? 

22 . 10 . Marcuse^^ presents the following data on the moisture content 
of 2 sample cheeses from each of 3 different lots with 2 subsamples per 
sample (remember samples 1 and 2 are not the same for the different lots): 


1 

Sample 

Lot 

I 

II 

III 

1 

39.02 

35.74 

37.02 


38.79 

35.41 

36.00 

2 

38.96 

35.58 

35.70 


39.01 

35.52 

36.04 


(a) Set up the analysis of variance for these data with the expected 
values of the mean squares, assuming everything random except the 
mean. 

(b) Estimate the values of the variance components. 

(c) Determine 90 per cent confidence limits for the ^4ots” component 
by the four methods outlined above. 

(d) Given that the costs are 10 per lot, 3 per cheese per lot, and 1 per 
subsample. Assume that 2 subsamples per cheese arc to be used and 
that the total cost is approximately 100, and determine the number of 
lots and cheeses per lot to minimize the variance of the general mean. 
Could a lower variance be obtained by using other than 2 subsamples per 
cheese ? 

(e) What are the 95 per cent confidence limits for the mean? 

22.11. Show that the expected values of the mean squares in Sec. 22.3 
are correct. 

22 . 12 . Derive the ML estimates for nested sampling with equal num¬ 
bers taken from each class. 

22 . 13 . Tn a 1940 Iowa AAA corn-acreage study, 2 sections were selected 
from each of 1,617 townships. The analysis of variance of the corn 
acreage per section was: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

Between townships 

Within townships 


6,511.9 

1,954.3 
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(a) Fill in the degrees of freedom, and determine the expectations of 
the mean squares. 

Q)) Estimate the variance components. 

(c) What is the variance of a sample mean if r' townships and 5 ' sec¬ 
tions per township were sampled? 

{(1) Determine 90 per cent confidence limits for the variance components. 

22 . 14 . Rigney and Reed^^ studied some of the factors affecting the 
variability of estimates of various soil properties. They took samples 
from 20 fields (A), 2 sections {B) from each field, 2 samples (C) consisting 
of a composite of 20 borings from each section, and then 2 subsamples (jD) 
from each sample, (a = 20, 6 = 2, c = 2, d = 2.) The analysis of 
variance for several of the properties is presented below: 


Source of Degrees of 
variation freedom 


Mean squares 


Calcium Magnesium Potash Organic matter 


19.19 

.1809 

.0405 

12.003 

3.59 

.0545 

.0076 

7.674 

.30 

.0080 

.0011 

.275 

.01 

.0005 

.0001 

.002 


(а) Fill in the degrees-of-freedom column. 

( б ) Set up the model for this sample and then the expectations of the 
mean squares in terms of the variance components. 

(c) Estimate the variance components for one of the properties and 
the variance of the general mean for this sample. What is the variance 
of a mean of a' fields, h' sections per field, c' samples per section, and d' 
subsamples per sample? 

22 . 15 . (a) In Example 22 . 2 , suppose it cost $2.70 on the average to 
get to a given township, $ 2.10 extra to cover each area in the townships 
selected, and $3 extra to secure and analyze the data from each farm 
selected in the sample. Determine values of 6 ', c', and d' to minimize 

if the total cost is fixed at $1,890. What is s\Y) for this sampling 

plan? 

( 6 ) Suppose it was decided to set d = 2 . Determine values of 6 ' and 
c' to minimize s^(F) under this condition, and compute ^^(F). 

22 . 16 . For a discussion of sampling from finite populations, see refer¬ 
ences 19 and 20. 

22 . 17 . (a) In Example 22.3, show how the expectations of the mean 
squares are computed from the expectations of the sums of squares. 

( 6 ) Compute the estimates of the variance components in this example. 
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(c) In this example, the six districts constituted the population. 
Hence what would you say regarding a^? 

(d) What would be the expected value and the variance of the general 
mean from these data? Of the mean for district I? 

(e) The six bulked samples actually consisted of two sets of three sam¬ 
ples each. Show how the analysis would change if the computations were 
based on the means of the three bulked samples per set and a component 
for sets was put into the model. The actual sum of squares between sets 
in paths was 99.31. 

22 . 18 . In Exercise 18.4, suppose that the 13 markets were chosen at 
random from a large population of markets. 

(a) Set up the mathematical model for this sample and the expected 
values of the mean squares in the analysis of variance. Show that the 
expected value of the mean square for markets is where is 

the variance component for markets and 


k = 


12 


69 - 


82 q_ 02 q_ . . . 92 

■... W 


(b) Estimate the variance components. 

(c) What would be the estimated variance of the mean of a sample 
from r' markets with q' sellers at each market? 

(d) Suppose it costs $10 to visit a market and $1 to enumerate each 
seller, what is the optimum number of markets and sellers if it is desired 
to obtain 100 schedules? How much will it cost to obtain this many 
schedules? 

22 . 19 . An experiment was to be designed to test various molds for their 
efficiency in the production of streptomycin. Before setting up an over¬ 
all experiment, a preliminary experiment was set up to examine the 
various sources of variability in the production and assay process. There 
are five stages in this process: the initial incuhation stage in a test tube (or 
slant, as it is generally called), a primany inoculation period in a petri dish, 
a secondary inocidation period, a fermeritation period in a bath, and the final 
assay of the amount of streptomycin produced. Several sampling plans 
could be devised to estimate the amount of variability contributed at each 
stage. Two of these plans are as follows: 

(i) Consider 5 test tubes, draw 2 samples from each test tube for the 
primary inoculation, 2 samples from each primary for the secondary 
inoculation, 2 samples from each secondary for the fermentation, and 
assay 2 samples from each fermentation bath. 

(ii) Use 2 test tubes as in (i); 2 test tubes the same, except only 1 assay 
per fermentation; 4 test tubes the same, except only one fermentation per 
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secondary and one assay per fermentation; 8 test tubes, with two pri¬ 
maries each and one of each stage henceforth. 

(a) Show that there are 80 assays for each plan. 

{h) Set up the analysis of variance for both plans, and determine the 
expected values of the mean squares, assuming all effects are random. 

(c) Which of these two plans would you recommend if one did not 
know in advance the relative magnitude of the variance components? 
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CHAPTER 23 


ANALYSIS OF DATA WITH BOTH RANDOM AND FIXED EFFECTS 

(MIXED MODEL) 

23.1. Randomized Complete Blocks. In this section we shall consider 
the same model as in Sec. 22.2 except that the are assumed to be fixed; 
in other words, inferences are to be made regarding only these particular 
p treatments. \Ac have made a basic change from the model of Sec. 18.3, 
in that the blocks are assumed to be representative of a larger universe of 
environmental conditions than those of the particular experiment being 
analyzed and we have added a (treatment X block) effect to the model. 

In Sec. 18.3, it was necessary to assume that there was no real TB inter¬ 
action if the blocks were assumed fixed and MS(TR) was to be used as 
the error mean square. If the true model for Sec. 18.3 was 

Yij = + Ti (3j + (T^)iJ + €ij^ 

with fixed block and interaction effects, the expected value of the error 
mean square, s^, would have been + di(T(3), where 

6,( t ^) = S2(r«V(?’ - !)(?> - !)• 

Hence since £^(MST) = o-| + 0(r), there would have been no real error 
term to test Ho: {n = 0), because under Ho the expected value of the 
denominator of F would have been greater than the expected value of the 
numerator; in other words, F would have been too small, and too few 
significant results would have been posted. 

If the experimenter is willing to assume that the block effects are ran¬ 
dom, he finds that 

E{s^ = E(MST) = + r^(r). 

Hence MST/s^ is distributed as F under the null hypothesis, Ho: {ri = Oj, 
and it is not necessary to assume a nonexistent interaction. 

But if more than one sample is secured for each treatment in each block, 
that is, q samples, the experimenter can assume fixed block effects and 
still have a legitimate error term even if there is a fixed interaction. In 
this case, we have an estimate of the sampling error, erf. This estimate is 
MSE, with rp{q ~ 1) degrees of freedom. Hence MST/MSE is dis¬ 
tributed as F under the null hypothesis of no treatment effects. 
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Let ns summarize these statements with a table showing the expecta¬ 
tions of the treatment {T), treatment X block {TB), and sampling {E). 
mean squares for various experimental conditions and assumptions, f 



Assumptions 

Mean 

square 

o 

II 

(r/3) 5 ^ 0 


Blocks fixed 

Blocks random 

One sample per treatment per block (g = 1) 

T 

TB 

o-g + rdij) 
{<^1) 

al + rd{r) 
crl + 01 (r/3) 

O'e + 

+ <rl) 

q samples per treatment per block 

T 

TB 

E 

+ rqdO) 

(pool) 

+ rg0(r) 

<^e + 

(-:) 

o'e + + rqdO) 

-b q<rl) 


The proper error term is indicated in parentheses. Note that for (rjS) = 0 
and g > 1, TB and E are both estimates of o-f and should be pooled to 
obtain the error mean square; however, if the experimenter is not sure 
that (t/3) = 0, he had better use TB as the error mean square. For 
(t/3) 0 and blocks fixed, there is no error term unless g > 1. 

As mentioned before, few experimenters wish to confine their con¬ 
clusions to the particular blocks used in the experiment. Hence they 
wish to assume random block effects. But if they assume random 
block effects and do not wish to assume nonexistent interactions, they 
can obtain an error term to test treatments even if only one sample is 
obtained for each treatment per block. And this error term is the same 
error term we have used all the time in Chaps. 17 to 21, the TB inter¬ 
action. But if the particular blocks used in the experiment cannot be 
considered to be representative of a wider area, the experimenter must 
have more than one sample per treatment per block in order to obtain an 
error term. 

Now let us proceed with an experiment using the model for this section: 

y ijk = M + ri + + ^ijkj 

with random block, interaction, and sampling effects and fixed treat¬ 
ment effects (p treatments, r blocks, and q samples per treatment per 

t It does not make any difference whether blocks are fixed or not if (r/3) = 0. 
There is no sampling mean square if ^ = L 
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block). Such an experiment is designed to estimate treatment differences 
and confidence limits for the true differences and to make tests of these 
differences, as well as to assess several sources of variability which cause 
fluctuations in the yields of a given treatment. We are seldom interested 
in the general mean, contrary to the importance of this mean in Sec. 22,2. 
And we generally are interested in only the variance components which 
affect treatment comparisons, because we want to plan experiments to 
minimize the variance of treatment differences for fixed costs or amount of 
experimental material. 

We have presented the expectations of the T, TB, and E mean squares 
for our present model [blocks random and (r/S) 0]. There is some 

controversy over the expectation of the blocks mean square, but we shall 
proceed as follows: 


P?m) = 9 ^ + Vl&i + ? 2 


We know that ^ t,- = 0 from the assumption of fixed treatment effects. 


It seems reasonable to also assume that 


J = 0, 

i 

since all the (for a fixed j) in the population are in the sample. (See 
reference 1 for a similar approach.) Naturally the sum overj for a fixed 
i is not zero, because the r blocks in this experiment are assumed to be a 
random sample from a larger population. Similarly 


{G - rpqii) = ft + (ijh. 


SSB = 


y (ft - p?m)- 

7 (G - rpquY 


Hence 


J^^XSSB) = pg(r - 1)ct! + (r - 1)(tI and £:(MSB) = pqal + 

Because of the above restriction on the {t(3) effects, we shall use the 
same definition used for nested sampling from a finite population, for 
example, reference 2, to define as follows: 

= 2 (jm/iv - 1 ). 

i 


Using this definition of it is easy to show that the expectations of 
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MS{TB) and MSE are the same as in Sec. 22.2, as stated earlier in this 
section. We see, for example, that 

E[S{Y — IX — ri)2] = rpqal + (p — + rpqal, 

ijk 

There are certain difficulties about the variance of Y when the treatn^ents 
are finite; presumably the finite population correction should be .used 
here, but we shall neglect it in the discussion which follows. Perhaps 
some of these theoretical difficulties would be resolved if we used the 
intraclass correlation concept, which is used by Hansen and Hurwitz® in 
considering finite populations for nested sampling. 

Since the {r^ are assumed fixed, 

E{SST) = rg J r| + (p - l){qa% + 

E(MST) = rqe'ir) + <rl 

where 6 ( t ) = >'9 ^ ’■?/(p — 1) and <r| = gtrj, + <r’, as in Sec. 22.2, and 

<7| = 

Hence the analysis of variance for this mixed model is as follows: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

.S(MS) 

Blocks 

r — 1 

MSB = Pi 

+ pq4 = 

Treatments 

p — 1 

MST = F2 

+ rqdir) 

T X B 

(r - l)(p - 1) 

MS(TB) = Pa 

<rl + qoi = <rl 

Samples 

rp(q - 1) 

MSE = P 4 



Since the r’s are fixed, 

= '^4-^4-^ = O'! + O' ! ~ 0^4 ^ Fi 4- Es - Vi _ 

^ \ ) r rp rpq rpq rpq 

The variance of a given treatment mean, Ti, is 

4- ^ 4- - g-l ^ El + pVz - Vi 

But the principal comparison desired in this experiment is the difference, 
dj between two treatment means, 

\rq r / rq rq 

The first two comparisons have the degree-of-freedom complication men¬ 
tioned in Sec. 22.2, but not (r‘^{d). The importance of the variance com- 
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ponents can be assessed as in Sec. 22.2 and confidence limits approximated 
by any of the four methods mentioned in that section. 

In the above analysis, V 2 is not distributed as because 


E{Ti — rqjj) = rqn ^ 0. 


F 2 is distributed as a noncentral x^[^(I^ 2 ) == cr| + constant] with a 
variance of 


4o-| r s 


+ 1^qS{T)] - 


u 


It can be shown that an unbiased estimate of the variance of V 2 , which 
consists of both random and fixed components, is 


sHI'^ 2 ) = I 


2 / 2 F 1 

(.f 3 + 2 )/; 


In general, if the random component consists of several sources of 
variation, which must be estimated from more than one mean square, we 
can write for the mean square (F) with both random and fixed components 

E{V) = (7? + X0, 


where af is the random and 6 the fixed component. We can write 

< 7 ? = ^ aj<j% 

j 

where aj = ± 1 and af = Vj. That is, <7? can be thought of as the sum 
and difference of several independent random variances, each estimated 
by a single mean square in the analysis of variance. Let V = Vr + Vs, 
where = Vr and \d = Vs. Since the F’s are all independent, 

Vr = 2ajVj and E(VVr) = E(Vr)E(V) = <T?(af + XO). 


Also, since A(F,?) = and E{V,Vt) = afal 

Jj 

-f = I + 2 n ^ X .m " II 


'a,F,F,. 


Collecting terms, we find that an unbiased estimate of the variance of a 
mean square F, consisting of both random and fixed components, is 


4 

^-FF. 


_ Zv fj + 2 


FI 


+ 2 ajakVjVk 


with ^ cijVj and / degrees of freedom for F. 

y 


t See references 4 and 5. 



MIXED MODEL 


6^6 


Example 23.1, An experiment was run to test the effectiveness of 0 
different spray materials on cherry trees with 9 blocks (81 trees) and four 
1 -pound random samples of fruit from each of the 81 trees in the experi¬ 
ment. The variable studied was the number of fruit per 1-pound sample, 
and it is assumed that block and sample effects are random. The totals 
of four samples for each of the trees were as shown in Table 23.1: 


Table 23.1 


Blocks 

Treatments 

Total 

0 

1 

2 

3 

4 

5 

6 

7 

8 

1 

506 

471 

580 

438 

497 

514 

468 

455 

494 

4,423 

2 

444 

404 

718 

478 

483 

484 

515 

451 

507 

4,544 

3 

452 

417 

638 

485 

474 

526 

495 

445 

506 

4,438 

4 

453 

443 

503 

437 

500 

539 

476 

457 

469 

4,277 ^ 

5 

468 

459 

596 

417 

493 

516 

462 

436 

470 

4,517 

6 

427 

428 

559 

457 

531 

496 

442 

479 

430 

4,249 

7 

460 

408 

583 

482 

509 

427 

470 

468 

462 1 

4,329 

8 

395 

506 

571 

414 

457 

452 

475 

418 

486 

4,174 

9 

455 

454 

718 

429 

515 

511 

406 

425 

484 

4,397 

Total 

4,000 

4,110 

5,466 

4,037 

4,459 

4,465 

4,209 

4,034 

4,308 

39,148 

Mean 

112.8 

114.2 

151.8 

112.1 

123.9 

124.0 

110.9 

112.1 

119.7 

120.8 


!j 


The total sum of squares between samples for the same tree was 4,610. 
The other computations are as follows; 


^ (39,148)» 

SST = - C = 45,326, 

36 

SSB = _ c = 2,804, 

36 


SSITB) . _ 

The analysis of variance is as follows: 


C - SST - SSB 

= 19,722. 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

£;(MS) 

Blocks 

8 

2,804 

350.5 

(Tg 4 “ 36(7* 

Treatments 

8 

45,326 

5,665.7 

+ 4(7*, + 360(t) 

T XB 

64 

19,722 

308.1 

+ 4(7*, 

Samples in trees 

243 

4,610 

18.97 













Again the experimenter is cautioned against use of these confidence 
limits to test the greatest mean against the smallest, unless he has decided 
to compare these particular treatments (regardless of their ultimate 
standing) before running the experiment. If the results of the experiment 
are used to decide on the comparison to make, the probability levels are 
upset. The reader is again advised to read an article by Tukey® on this 
topic. 

23.2. Split-plot Designs. A special type of incomplete-blocks design 
is the split-plot design, in which one set of p treatments (T) is arranged in 
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a randomized complete-blocks design with r blocks (J5) and a second set 
of q treatments (A) is assigned at random to one of q subplots in each of 
the pr whole plots. The mathematical model for this experiment is 

y ijk == M + Ti + /3j + + OLk + {ra)ik + €ijkj 

where a, r, and {ra) are fixed treatment effects, and tijk are random 
sources of error, and (3j is a random block effect.f The analysis of 
variance for this model is: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

E(MS)f 

Blocks (B) 

r — 1 

SSB 

V, 

+ PQ<r! 

T 

p - 1 

SST 

Vt 

<^e + + rqdiir) 

T XB 

(r - l)(p ~ 1) 

SS(TB) 

Va 

+ Qcrl 

A 

q - 1 

SSA 

Va 

<^e + 'I'pBAoi) 

T X A 

(p - l){q - 1) 

SS(PA) 

Vta 

+ r6i{ra) 

Subplot error 

p(r - l)(q - 1) 

SSE 

V, 

< 


t diO), diioi), and dzira) are sums of squares of fixed effects divided by the degrees 
of freedom. 


The first five sums of squares are computed in the usual manner for 
main effects and interactions (see Chap. 20) and the subplot error (SSE) 
by subtraction. Note that we have omitted afj, from the expectation of 
the blocks mean square. 

In making tests of significance of the differences among fixed effects, 
SSE is the error sum of squares for A and TA and SS(TjB) for T. The 
problem of determining confidence limits for differences between various 
treatment means is slightly complicated. We know that 


(TAU 


Vfi + rn + ^ ^ + rak 



We have the following types of means to compare: 
(i) Two T treatments 


It - fi — fi' = {ri — Ti>) + ^ — {r^)i'jl/r + (eijk — ei^jk)/rq, 

j j k 

9 9V 

E(d,) = in - rr), aKd.) = ^ [q^l + <7?] = 


t In this case we assume that there is no real (a^) or (ra:/3) interaction but that the 
only deviation in the subplot is a single random error, 






(ii) Two A treatments 


— Ak — Ak' — {ak — ak') + {^ijk — ^ijy)/r'Py 


E{da) = dk — 


cr^(da) = 2 (rl/rp = 2 Velrp. 


(iii) Two T treatments with the same A treatment 

dtk = {TA)ik — {TA)i>k = {ri — Ti>) + ^ [( t / 3 ),j — (r^)i'j]/r 

+ [(Ta!)ifc — ^ (eijk — ei'jk)/r, 

E(dtk) = {ri — Ti') + [(rQ;)i;k — {ra)n], 
a^dtk) = 2(4 + = 2 [F,, + (5 - 1)7JM. 

(iv) Two A treatments with the same T treatment 

dia = {TA)ik — {TA)ik> = {dk — dk') + [{Ta)ik — (ra)iA;'] 

+ ^ {^ijk — ^i3k')lr, 

E(dic) = {dk — dk>) + [(r<^)ifc — {Td)ik'], <T^(dia) = 2al/r. 


(v) Two different A treatments and two different T treatments 
dta — {TA)ik — {TA)i>k' — {ri — Ti') + ^ [{rl3)ij — (4)4/f 

y 

+ {dk — dk') + [{rd)ik — {rd)i'k'] + ^ {^ijk — ^i'jk')/r, 

y 

E{dta) ~ {ri — Ti') + {dk — dk') + [(to:).*/; — {ra)i>k'\ 

a‘^{dta) = 2{cl + c7^)/r = 2[F,5 ff {q - l)F.]/rg. 

It should be mentioned that if the experimenter wants to use (iii) or (v) 
to test the difference between two treatment means, he does not know 
how many degrees of freedom to use, since two error terms are involved. 
The number of degrees of freedom will fall between (r — l)(p — 1) 
and p{r — l)(g — 1 ). Hence, if the difference is significant using 
(r — l)(p — 1) degrees of freedom, it will be significant for the correct 
degrees of freedom; and, conversely, if it is not significant using 
p{r — l){q — 1) degrees of freedom, we can conclude it is not significant. 
However, if it is not significant for the smaller and is significant for the 
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larger number of degrees of freedom, the test is inconclusive. A rather 
good approximation to the correct number of degrees of freedom is 
furnished by Satterthwaite:^ 

_ [F.5 + ( 9 ~ DF.P 

^ “ _ 

(r - l)(p - 1) p{r - l)(g - 1) 

A more complete description of the split-plot design can be found in 
references 8 and 9. This design enables the experimenter to obtain quite 
accurate information on the A treatments and on the interaction between 
T and A at the expense of less accurate information on the T treatments 
as compared with a factorial design (Chap. 20), since the error mean 
square for the T treatments has an added component of variation and is 
estimated with fewer degrees of freedom. Also, with this design, the 
experimenter can allocate the T treatments to rather large plots and 
reserve the small subplots for the A treatments. If one set of treatments is 
cultivation methods and a second set is fertilizer treatments, it would be 
advantageous to put the former on the larger whole plots because of the 
difficulty of handling machinery on small plots. 

Also, this design is useful if the A treatments are successive years in a 
long-term experiment. Unfortunately the years cannot be randomized, 
but it is not the year itself, but the effect of the year which we want to 
randomize. And it seems reasonable to assume that nature does a fairly 
good job of randomizing the year-to-year effects. But in this case it 
seems more reasonable to regard the subplot treatments (years) as ran¬ 
dom effects, and not fixed. And since they are random, the assumption 
that the {a^) interaction is zero seems somewhat unrealistic. Hence for 
this case we would advocate using the following model: 

Yijk — g + Ti + + (r^)c' + afc + {ra)ik -fi {^oi)jk + 

with everything random except m and r^. Of course, all interactions with 
n arc assumed random in one direction only. Also, we still assume (ra^) 
is zero, an assumption which does not strike us as being too bad even 
when the a’s are random effects. 

Example 23,2. A corn-yield experiment was conducted to compare 
four methods of primary seedbed preparation (T) and four methods of 
planting {A ), using four blocks (B ), The seedbed preparations were used 
on the whole plots, and each whole plot was divided into four subplots for 
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the four planting methods. The corn yields in bushels per acre were as 
shown in Table 23.2. 


Table 23.2 


Replication 


Planting methods 


Total 

ai 

02 

as 

04 

1 

81.8 

h plowed 7" 

46.2 78.6 

77.7 

284.3 

2 

72.2 

51.6 

70.9 

73.6 

268.3 

3 

72.9 

53.6 

69.8 

70.3 

266.6 

4 

74.6 

57.0 

69.6 

72.3 

273.5 

Total 

301.5 

208.4 

288.9 

293.9 

1,092.7 

1 

74.1 

plowed 4" 

49.1 72.0 

66.1 

261.3 

2 

76.2 

53.8 

71.8 

65.5 

267.3 

3 

71.1 

43.7 

67.6 

66.2 

248.6 

4 

67.8 

58.8 

60.6 

60.6 

247.8 

1 

Total 

289.2 

205.4 

272.0 

258.4 

1,025.0 

1 

68.4 

ts blank basin listed 
54.5 72.0 

70.6 

265.5 

2 

68.2 

47.6 

76.7 

75.4 

267.9 

3 

67.1 

46.4 

70.7 

66.2 

250.4 

4 

65.6 

53.3 

65.6 

69.2 

253.7 

Total 

269.3 

201.8 

285.0 

281.4 

1,037.5 

1 

71.5 

ti disk-harrowed 
50.9 76.4 

75.1 

273.9 

2 

70.4 

65.0 

75.8 

75.8 

287.0 

3 

72.5 

54.9 

67.6 

75.2 

270.2 

4 

67.8 

50.2 65.6 

63.3 

246.9 

Total 

282.2 

221.0 

285.4 

289.4 

1,078.0 4,233.2 


The four planting methods were surface drill (ai), list (ao), basin list 
(az), and list with bedding sweeps (ai). The T and A totals are: 



Sum 

Alcan 


Sum 

Mean 

T, 

1092.7 

68.29 


1142.2 

71.39 

T, 

1025.0 

64.06 

A. 

836.6 

52.29 

Tz 

1037.5 

61.84 

Az 

1131.3 

70.71 

T, 

1078.0 

67.38 

A, 

1123.1 

70.19 





The analysis of variance is as follows; 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

E(M>S) 

Blocks 

3 

223.81 

74.60 

(tI 

+ 

Seedbed methods (T) 

3 

194.56 

64.85 


+ 4<rl + lOfflM 

T X B 

9 

158.25 

17.58 

<rl 


Planting methods (A) 

3 

4,107.39 

1,369.13 

<rl 

+ 1602(«) 

T X A 

9 

221.74 

24.64 

<rl 

-b 403(Ta) 

Subplot error 

36 

608.47 

16.90 




There are highly significant differences between the planting methods 
(F = 1,369.13/16.90 with 3 and 36 degrees of freedom) but not between 
the seedbed methods (although the F = 64.85/17.58 = 3.69 is almost 
significant, since F.os = 3.86 for 3 and 9 degrees of freedom). 4here is 
no evidence of any interaction between the two sets of treatments. An 
examination of the planting means shoAvs that only is really different 
from the rest. In order to adjust for the fact that we have selected the 
smallest from a set of four means, which can be done in 4 ways, the 
significance probability assuming random choice might be multiplied by 
4 to obtain an estimate of the true significance probability. If we take 
the linear form I — A 14 -A 3 + /I 4 ~ 3^2 = 886.8, we find that 
= 192o-f = 3,244.80. Hence 

t = 886.8/V3,244.80 = 15.6, 

for which the estimated significance probability is much less than 
4(.0005) = .002. And we see that this 1 degree of freedom accounts for 
almost all of SSA. The sum of squares for I is (886.8) V192 = 4,095.91, 
leaving only 11.48 for the other 2 degrees of freedom, a decidedly non¬ 
significant amount. 

The variance of the difference between two T means is 
2(17.58)/16 = 2.20. 

The variance of the mean difference between tAVO T treatments Avith the 
same A treatment is 2[17.58 + 3(16.90)1/16 = 8.54. This Avould apply to 
testing versus t 2 Avhen planting method ai Avas used, the mean difference 
being 3.075 in this case. And finally the variance to compare two A treat¬ 
ments with the same T treatment is 2(16.90)/4 = 8.45. This would apply 
to ai versus a 2 for h, the mean difference being 23.275. 

Example 23.3. Wilm^^ presents some data on soil moisture deficits 
(in inches of water) as affected by timber cutting Avith 4 blocks, 5 T treat- 
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ments (amount of cutting), and 3 years as subplot treatments. The 
T treatments were in terms of volume of board feet of trees larger than 
9.6 inches in diameter which were left in the forest after treatment: 
uncut (11,900 feet); 6,000 feet; 4,000 feet; 2,000 feet; all cut. The years 
were 1941, 1942, and 1943. In this case it might be reasonable to regard 
the years as random and add the (year X block) random effect to the 
model. The analysis of variance for these data was: 


Source of variation 


Blocks {B) 
Treatments {T) 
T X B 
Years (4) 

T X A 
B X A 

Remainder (E) 


Degrees of 
freedom 

Mean 

square 

3 

1.4832 

4 

1.3333 

12 

.3009 

2 

6.5418 

8 

.2554 

6 

.1294 

24 

.1053 


E(MS) 


+ 5<rL + 154 

4 + Sal, + 44 „ + 120(t) 

+ Saf, 

+ 54 « + 204 

+ 44a 

erf + 54„ 

2 


Using F = .1294/. 1053, we see that 4a does not test significantly 
greater than zero; and using F = .2554/. 1053 = 2.43, we see that o-f^ 
tests significant at the a = .05 probabilit}^ level. The test of 4 is 
not too good, since there are only 6 degrees of freedom for error, but 
F = 6.5418/. 1294 is significant at the a = .001 level. The difficult test 
to make involves ^(r). There is no single error term to make this test. 
Two different tests have been proposed. Both tests are based on 
the Satterthwaite approximation.In the first case, we use an estimate 
of + 34fc + 44a) ^s the error term for MST. This estimate is 
Vtb Vta ~ Ue = Vr with /' degrees of freedom, where 

Vr 

^ (WuT+ (vfju) + {v^m 

and the fs are the corresponding degrees of freedom. Then we assume 
that F' = Vt/Vr is approximately distributed as F with (ft and/') degrees 
of freedom. Cochran'^ suggests that we add Ve to Vt and use 

U,, + Vta 

with// and// degrees of freedom. // and// would have to be computed 
as // There is some indication that F" is more powerful than F' in detect¬ 
ing real treatment effects, but there is some doubt that it is enough better 
to afford to compute 2 estimated degrees of freedom instead of 1. 
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In the Wilm problem, 7. = .5410, and F’ = 1,3333/.5410 = 2.46 with 
degrees of freedom 4 and 

„_ UilO)! _= 14 . 

“ (.3909)’ , (.2554)’ , (. 10.53)’ 

8 ^ 24 

F' is not significant at the 5 per cent level; actually « is about .10 in 
this case. Similarly 

F” = 1.4386/.6463 = 2.23 

with (5,20) degrees of freedom. Again a is about .10. 

The variance of the difference between any two T treatment means is 

<rKd) = 2 [”!■• + + ^] = 2F.M, 

s%d) = = .0902. 

Hence the 95 per cent confidence limits for 8 will be 

d - (2.15) (.30) < 6 < d + (2.15) (.30), 
where we use a t value with f, = 14 degrees of freedom. 

EXERCISES 

23.1. Prove that the expected value of in Sec. 18.3 is actually 

( 7 | + a% if there is an interaction. . . ^ j nu on 

23.2. Reexamine the examples and exercises in Sec. 18.3 and Chap. 20 
to see in which experiments it is logical to assume the blocks are repre¬ 
sentative of a wider set of experimental conditions than in the given 

experiment. , . . -r^ 

(а) How would the cities in Exercise 18.9 and the plants in Exercise 

18.10 have to be selected to make this condition true? 

(б) How can you rationalize that consecutive days or consecutive 
years might have random effects on crop yields or other experimental 

results, such as in Example 18.3? ^ 

(c) Where does this rationale fall down in the analysis of economic 

data, such as market prices? 

23.3. What assumptions must be made concerning the effects in a 
Latin-square design in order to extend the results to a wider horizon than 
the given experiment? 

23.4. Determine the four approximate 90 per cent confidence inter¬ 
vals for dfi in Example 23.1. 
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23.6. (a) In Exercise 20.3, what changes should be made in the analysis 
if the four blocks and five varieties constituted representative samples 
from large universes of blocks and varieties? 

(b) What are the expectations of the mean squares under this assump¬ 
tion? 

(c) Determine estimates of the random variance components, and 
make the necessary tests of significance. 

(d) Make a test of spacing differences, and determine the standard 
error of the difference between two spacing means if p’ varieties and r' 
blocks were used. 

23.6. A field experiment was conducted to determine an acceptable 
lower limit on the size of similar future experiments. The analysis of 
variance was: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

Blocks 

2 i 

4,399 

Treatments 

4 

4,429 

T XB 

8 

866 

Samples in plots 

15 

239 

Determinations in samples 

30 

7 


(a) Assuming everything random, except treatments, set up the expec¬ 
tations of the mean squares, and determine estimates of the random 
components. 

(b) What is the standard error of the difference between two treatment 
means with b blocks, s samples per plot, and d determinations per sample? 
Which of these three sampling plans (which cost the same) would be 
favored: 6 = 6, s = 2, d = 1; 6 = 3, s = 5, d = 1; 6 = 4, s = 2, d = 2? 

(c) Determine the 90 per cent confidence limits for 

23.7. Show how the variance components in Example 23.3 were 
obtained, and also a^id). 

23.8. Derive an approximate test for Example 23.2 to test whether the 
average yield for ti and ti was significantly greater than the average of t 2 
and tz. 

23.9. Johnson and Tsao^^ conducted a psychological experiment to 
determine the difference limen (DL) of subjects for weights increasing at 
constant rates. Two classes of people were chosen as to sight—normal 
(A) and congenitally blind (B). Two males and two females were 
selected to represent each class, giving a total of 8 people. Then the 
average of five DL values was determined for each subject for each of 28 
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rate-weight combinations, 7 weights and 4 rates. The entire experiment 
was repeated at a later date with different people. Hence there was a 
total of 28 X 8 X 2 = 448 average DL’s. The order of presentation of 
the 28 combinations was randomized for each subject at each date. 

(a) Set up the analysis of variance as to sources of variation, degrees of 
freedom, and expectations of the mean squares. 

ih) How would the analysis be changed if each of the five DL values 
was recorded instead of a single average being reported? 

(c) A portion of the data (for 3 weights and 2 rates) is reproduced in 
the accompanying table for computational purposes. 


Data of Exercise 23.9 


Weight 

Rate 

1 

2 

3 

a 

b 

a 

h 

a 

h 

Sex 

Sight 

Date 

Individual 









1 

1 

4.5 

10.9 

4.5 

10.1 

4.5 

8.6 

M 

A 


2 

14.0 

30.5 

13.9 

25.5 

12.2 

29.3 



2 

1 

3.1 

6.2 

3.3 

6.8 

3.5 

7.0 




2 

13.4 

26.2 

12.8 

23.5 

12.8 

27.8 



1 

1 

24.2 

48.1 

25.3 

41.2 

25.1 

31.4 

M 

B 


2 

19.3 

41,0 

21.1 

30.1 

19.6 

35.0 



2 

1 

41.2 

59.1 

29.8 

59.7 

28.5 

48.7 




2 

19.9 

44.1 

18.3 

28.5 

15.9 

31.8 



1 

1 

18.5 

22.3 

10.2 

25.2 

11.4 

19.7 




2 

3.1 

7.0 

3.9 

5.4 

3.6 

7.3 

F 

A 

2 

1 

11.2 

21.8 

8.8 

14.0 

8.1 

21 .8 




2 

3.9 

11.4 

5.1 

10.2 

3.7 

8.7 



1 

1 

9.6 

17.9 

7.3 

18.2 

6.5 

15.8 




2 

9.0 

16.1 

6.4 

15.9 

6.9 

12.9 

F 

B 

2 

.1 

6.1 

13.9 

6.0 

12.5 

6.4 

12,1 




2 

8.6 

14.5 

6.9 

14.2 

5.3 

12.0 


23.10. Professor J. Wishart^® furnishes us this example. “The seed 
mixtures denoted hy A, B,C,D below were sown under wheat in 1938, the 
treatments being randomized over the plots of each block. In the sum¬ 
mer of 1939, Blocks I, IV, VI and VII were grazed and Blocks II, III, V 
and VIII were cut for hay (each successive pair of blocks was taken as a 
unit and a random choice made as to which block should be grazed and 
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which cut for hay). The table below gives the estimated weights, in 
pounds, of clover (green) in the May 1940 cut. Analyze these weights 
fully and draw attention to the significant results. The sum of the 32 
weights is 527.4 and the sum of their squares is 9,410.18.^^ 


I 

II 

III 

IV 

c 


B 

A 

! c 

D 

D 

B 

8.7 

14.1 

15.2 

14.8 

16.3 

22.4 

25.1 

25.3 

A 

D 

C 

D 

A 

B 

C 

A 

8.8 

13.2 

8.7 

9.4 

9.4 

13.9 

19.6 

18.1 


V 

VI 

VII 

VIII 

C 

A 

C 

B 

C 

B 

D 

A 

22.2 

21.9 

22.4 

16.6 

19.6 

18.6 

19.9 

17.4 

B 

D 


D 

D 

A 

B 

C 

21.5 

20.6 

16.0 

13.6 

13.3 

13.3 

14.4 

13.1 


23.11. Suppose in Example 20.2 that the plot yields were available for 
each of the 12 years with the following sums of squares for years and inter¬ 
actions with years: 


Years (F) 165,808 

Years X treatments {YT) C,J85 

Years X replications (FE) 19,554 

Remainder {E) 44,490 


(а) Set up the complete analysis of variance for this experiment. 
Remember that the analysis in Example 20.2 was based on the total 
yields for 12 years, Avhile the above sums of squares are based on annual 
yields. 

( б ) What conclusions would 3^011 draw from this analysis? 

(c) What aspects of these data might violate the assumptions behind 
the use of the analysis of variance in making tests of significance? Do 
you have any suggestions for improving the analysis? 

23.12. In Exercise 22.4, assume that the treatment effects are fixed. 

(a) Derive the expectation of the treatment mean square. 

(5) How would you test for treatment differences? 

23.13. In a bee experiment conducted by E. Oertel in Louisiana, the 
honey was collected from 32 colonies randomly assigned to 4 rows of 8 
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colonies each for each of 5 years. It seems reasonable to assume that 
colony and year effects are random and row effects are fixed. We note 
that this is a completely randomized design for the whole plots (rows are 
T treatments). The analysis of variance for the yields in pounds of 

honey per year was: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

E{MS) 

Rows (7’) 


15,503 

+ SerL + + mir) 

Colonies in rows (C) 


2,529 


Years (4) 


107,297 

<rL + 32<r2 

T X A 


3,019 


C X A 


2,589 



(a) Set up a mathematical model for this experiment. 

(b) Fill in the dogrees-of-freedom column, and show that the £;(MS) 
column is correct. 

(c) What do you conclude regarding the colony-to-colony variation in 

the same row? ^ • • -u- 

(d) Is there much evidence of a real row-by-ycar interaction in this 

problem? 

(e) What is the most important source of variation? What are 
approximate 90 per cent confidence limits for this component ? 

(/) How would you test for row differences? 

(g) The mean productions per colony per year for each row were: 
182.5, 149.2, 158.9, and 135,6. Show that the standard error for the 
difference between any two row means is 12.6. 

23.14. Suppose data were available on the average sales price per pound 
of tobacco for a period of y years at each of ni markets. Ihese markets 
are divided into 4 geographical areas with mi, m2, m3, and nii markets per 



(а) Set up an analysis of variance to reflect the sources of variation in 
the marketing picture, assuming that areas are fixed and the remaining 
components are random. 

(б) Determine the expectations of the mean squares. 

(c) What is the variance of a mean over all 4 areas with m' markets per 

area and y' years? 

23.15. A randomized complete-blocks experiment was conducted on 
the percentage of acid in orange juice with 4 blocks and 5 fertilizers, each 
plot having 3 trees. Two samples were taken from each tree, one sample 
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being picked by one man and the second sample by another man. The 
basic analysis of variance was as follows: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

F(MS) 

Blocks (B) 

3 

.002315 


Fertilizers (T) 

4 

.042064 


T X B 

12 

.009133 


Trees in plots (E) 

40 

.004735 


Men (A) 

1 

.012813 


T X A 

4 

.003924 


A XB 

3 

.003449 


Remainder 

52 

.002749 



(a) Set up a model for this experiment, assuming we want to make 
inferences about only these fertilizers, but that all other effects are repre¬ 
sentative of large populations. 

(h) Compute the expectations of the mean squares for your model. 

(c) Compute estimates of the variance components. 

(d) How would you test for fertilizer differences? 

(e) What is the variance of the mean amount of acid for any one 
fertilizer if r' blocks, q' trees per plot, and m' samples are used? Compute 
this variance for r' = 4, g' = 3, m' = 2; r' = 6, g' = 1, m' = 4. 

23 . 16 . (a) In Sec. 20.6, shoAv that if the replication effects are assumed 
to be random, we can obtain the following analysis for the (2r — 1) 
degrees of freedom between blocks: 


Source of variation 

Degrees of i 
freedom 

Mean 

square 

F(MS) 

Replications (R) 

r - 1 

MSR 


ACD 

1 

MS(4e7)) 

+ 44. + irOir) 

{ACD) X R 

r - 1 

1 _ 

MSE't 

+ 4<r^ 


t MSE' is the whole-plot error mean square, and o-^/ is the variance for the (replica¬ 
tion) X {ACD) error. 

(6) Apply this result to test the NPK interaction in Exercise 20.15, 
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CHAPTER 24 


THE RECOVERY OF INTERBLOCK INFORMATION 
IN INCOMPLETE-BLOCKS DESIGNS 

24.1. Balanced Incomplete-blocks Designs. It was mentioned in 
Sec. 19.3 that Fates^ introduced a new computing technique for balanced 
incomplete-blocks designs which utilized the so-called inteMock informa¬ 
tion on treatment effects. The basic change in the model is that the /5’s 
are assumed to be random effects with the same variance, orf, the plot-to- 
plot (intrablock) error being designated by o-f. From Sec. 19.3 we know 
that the intrablock estimate, U, of had a variance 

M.) = ^ = PJZl 1 

ItEf rEfWi p rEfWi 

where = l/o-f. We can obtain a second estimate of Ti (the interblock 
estimate) from ^ nijBj ^ Tm (see Sec. 19.3). 

3 



where TF 2 = + crl). 

If we have two independent estimates (^i and ( 2 ) of the same quantity, 
r, the combined estimate with minimum variance is given by 


W[tl + W',t2 

tf; + W'2 ’ 
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where W\ = Hence the combined estimate of r'(= + m) is 

rEfWiti + hi g 

tie ■-= tie + m == ■ .-•(—ZrX)T?2“ ^ 


rEfWi + 

{kTi - T,i)W, , T,iW2 

k k 


k 

kGW2 

pk 


rEfWi + - 


(r — X)TT ^2 


+ - 

rp 


k 

= L + eD< = 

r r 

where 6 = {Wi - W 2 )/rlp(k - \)Wi + {p - k)W 2 ] 
and Di = [{p - k)Ti ~ {p - 1)^. + (k - 1)G]. 

Certain identities are useful in making the above simplification: 

_ X)(p - 1) - r{p - k) and kEf(p - 1) = pik - 1). 

The variance of the difference between two adjusted means, using 
interblock information, is 

_ 2k(p - 1)_ 

r[p(k — l)TTi + (p — k)W 2 ] 

The computations needed to estimate TTi and W 2 are the following. 

(i) SSB(unadj) in the usual manner. 

Jr... ( In )' 

= SSTft. 


(ii) 


k{f - X) pk{r - X) 


(iii) 


= SSD. 


N{p - k)(k - 1) 

(iv) SSB(adi) = SSD + SSB(unadj) - SST^, with (q - 1) degrees of 
freedom. = MSB(adj). 

(v) SSE = Sif - SSB(adi) - SST(unadj), with N - p - q + 1 
degrees of freedom. Ve = MSE. 

(vi) It can be shown that E(Vh) = <^l + l 


Tx. . N - V _ 

- W=~i)n - ip~k)Ve 

(vii) Compute r6 and then = (Ti + i'dDi)/r. 

In general it is also advisable to compute SST(adj) as a check on the 
computation and to make a test of the differences among the treatment 




means. If the experiment is large enough so that we can replace TI^i and 
Wi by their estimates in the formula for 4, the problem is solved. If 
there are fewer than 15 degrees of freedom in V^, the authors do not 
recommend using this weighted analysis; instead, they advise using the 
unweighted analysis of Sec. 19.3 (using no interblock information). And 
with 15 to 25 degrees of freedom, the weights are not too stable. 

If groups of blocks form c complete replications, the sum of squares for 
replications with (c - 1) degrees of freedom is removed from SSB(adj) 
and the expected value for Vf, [with (g — c) degrees of freedom] is 


E(V,) + 


k(c — 1 ) 


Hence 


W,. = 


k{q ~ c)V,~~ (p - 


E{V,) = cri + 


k{r — 1) 


w ^ — 1 

rVb - Ve 


This is the result for balanced lattices with c? = 1 . 

Example 24.1. Suppose we estimate <jI from the balanced-lattice 
experiment of Example 19.1. The values of Di = + 2G are 

242, -6, -276, 118, 74, -28, -80, -76, and 32. 


( 152)2 + . . . + ( 188)2 ( 1 , 512)2 


- 127, 


SSD = .®± 


= 389, 


SSB(adj) = 389, 

since SST^ = SSB(unadj) — SS(replieations). 


IF. = 


= 48 . 63 , V, = 28 . 57 . 

_ 3 _ 3 

4 Fi, - F. 4 ( 48 . 63 ) - 28.57 ~ W' 

V" - Ve _ 20.06 _ 

4 ( 18 TFi + 6IF2) 72 F 6 3,501 “ 


. 03500 . 


The adjusted treatment means, 4 = (T./r) + SDi, are 20.13, 11.72, 2.92, 
22.93, 14.17, 11.34, 13.54, 7.06, 22.18. The variance and standard error 
of the difference between two adjusted means are 


4 ( 18 IF, + 6IF2) “ 3IF1 + IF2 


= 16 . 25 , 


<T{d) = 4 . 03 . 




ODl 


Using recovery of interblock information, the relative efficiency of this 
design compared with randomized complete blocks is 

I = 17.64/16.25 == 1.09. 

Hence the recovery resulted in a gain of 16 per cent over no recovery. 
However, with only 8 degrees of freedom to estimate o-f, we believe that 
this is a rather fictitious gain since the estimates of the weights (PTiand 
TU 2 ) have considerable error. 

24.2. Simple Lattice Designs. In Sec. 19.4, we indicated that most 
analyses of lattice designs also make use of interblock information to 
increase the efficiency of the estimates of treatment effects. Since the 
blocks are arranged in complete replications, a constant 7 is introduced 
into the model for fixed replication effects, with random ^ effects within 
replications. As before, we assume that we have p — treatments in k 
blocks of k treatments per block. Since the simple lattice has 2 replica¬ 
tions, we have 2k blocks and 2k^ plots. 

The first replication is designated as X and the second as Y with 
individual yields Xij and Yij, where {i,j) = 1 , 2, . , . , A:. The mathe¬ 
matical model for Xij and Yij is 


Xij = M + Ta: + ( 3 xi + Tij -h €y, 

Yij = M + + ^yj + T-ij + 


where /3, e, and e' are independent random effects and the others are fixed 
{/ 5 j are NID( 0 ,cr|) and {€,€'} are NID(0,a-2). Hence 


G 

= 2A;V 

+ I ^“) + II 

\j + t 

X.. 

= k'^ilX + 7a;) -f ^ 


II 

1. 1 


Y.. 

== k^iiJ- + Ty) + A; ^ 

) ^V3 + 

^ J 


Xi. 

== A:(ju -f- yx) + k^x- 

'• + 1' 

J 

ij -1" 

i 


Yi. 

— k{^ + 7j/) + ^ 

ui + 2 

J 

ra + ^ <!, 


Di. 

^ X.-.- ri.= k{y. 

J J 

~ iv) + ki^xi ~ ^ 

Vi + 

D.. 

= — -Yy) + k 

(E»- 

J 

- 2 + 



i J i j 


where Di. is a linear function of the block effects adjusted for treatment 
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effects. Similarly we can compute D.y, using X.j and F.y. These {2k) 
D’s could be used to solve for the /5^s, adjusted for the r’s. 

We can use X. and Y,. to estimate ju, lx, and jy. The replication 
sum of squares in the analysis of variance (with 1 degree of freedom) is 

X?. + _ (X.. - F..)2 

2k‘^ ' 'W 

with an expected value of -f- ka^ + o-f. 

A block sum of squares, adjusted for treatments, with 2(k — 1) degrees 
of freedom, can be computed from 


SSB(adj) = 


2Dl + 

2k 


2D?. 

2k^‘ 


£'[SSB(adj)] = 2{k - 1) 



Hence if we designate Fi, = SSB(adj)/2(/c — 1), 


It can be shown that 


E{VG 



2 

e* 


SSE = Sx^ + Sy^ - SST(unadj) - SSB(adj), 


with 2(k^ — 1) — {k‘^ 
Also, 

so that 


— 1) ~ 2{k — 1) = {k — 1)2 degrees of freedom. 
E{Ve) = E(MSE) = cri, 


n)A. 


In order to estimate a treatment effect, Uj, we shall writef 


tij — ti. “f" t.j “b {iij ti. t.j) 

= {ii. + r.i — 2m) + {t\j — tl — + m) = t v. 


In Sec. 19.4, we used Fj. as an estimate of and X.j as an estimate of 
t'.j, since these two estimates were free of block effects. However, now 
that we have a means of handling the block effects, it seems reasonable to 
utilize this interblock information. 

It turns out that the part in the second parentheses (v) is independent 
of block effects, even when we use all of the data to estimate ?•. and t'.j 


t This method is suggested by the similarity between this design and a factorial 
with k levels of two treatments. The effect of the fth level of one and jth level of a 
second is generally written in this form. Let t stand for the main effects (first paren¬ 
theses) and V for the interaction (second parentheses). 




j. J.X. X i. » * 


That is, if we use all of the data 

F,. + X;. 


t'.. — 


m. = 


■ 2/v ■ 

X.j + Y.j 

X,y + F,y 


I _ I I “ 1 “ 

= fi + Ti. H--2-1-2—' 

==.1^.4- ^±Aj + 

- fi -t r.j-t 2 ^ 2 


JU + Tij + 


^xi “h ^yj I ^ij “h 


+ 


G 

2 /c 2 


M + 


/3x. + ^y. 


+ 




where e,:. = ^ e^y/Zc, 


6 . . 


etc. Hence 


V == (4 — — t'j + m) = Tij — fi. — f.y + iv, 

where = i[e^y + e'/ — (ii. + e[. + e.y + + (e.. + eF)]. 

However, if we use all of the data to estimate t ^ (J\. + V.j ~ 2m), 
block effects will be involved. Hence it is proposed that we obtain two 
different estimates of Z, one free of block effects (ii) and one involving 
block effects (^ 2 ), and form a weighted average of the two. 


}V[ + W' ’ 


where W'i 


1 /(rf. Then we shall have 


ti = 

t2 = 


Yi. + X., 

k 

Xi. + Yj 

k 


^ - -L- a- 

p = r.j + T... + 61, 

^ = f.j + Ti. + /32 + 62, 


where ei = {e.j + ej. — e.. — ek), 62 = fe. + e.j — e.. — iF), and ^2 — 
^xi + ^yj - - By .. Hence 


(F{ = 


2 Fc - 1) 


and 


<^2 


I = [crj + kail 




Therefore t = [PFiF + TF 2 ^ 2 ]/(lFi + TF 2 ), where TFi = 1/crf, and TF 2 = 
^/{kal-YcTl). 

Again we caution against the use of a weighted analysis if there are less 
than 15 degrees of freedom for estimating al (and we have grave doubts 
when there are less than 25 degrees of freedom). It is better to duplicate 
the experiment to obtain more information on <rl if weighting is to be used. 
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In computing adjusted treatment means, some simplification can be 
made. We note that 


X,y 4- X,. + F,. + X.,- + F.y 

2 ■ 2k 


WiiYi. + X.y) + TF2(X,. + F.y) 

A;(IFi + TFs) 


2kOVi + TF 2 ) 


(D,. ~ D.j). 


Note that (Di, — D.j) is the same as 8ij given in Sec. 19.4. The only 
difference in 4- is that 8ij is multiplied by the weighting factor 


2 A:(IFi + W,) 

instead of 1/2A:. If o-f = 0, so that TFi = IF 2 , no adjustment at all is made 
for blocks: 4 == ^v/2- If is small compared with $ approaches 
1/2A:, the value used without recovery of interblock information. 

It can be shown that the variance of the difference between two 
adjusted treatment means, which appear in the same block, is 


(k — 1) + 


IFi + IF2 


And for varieties not in the same block, this variance is 




The average variance is 


1 1 'j I 4IFi _ 1 . . 4:kd 

jr+TjWi Wi -f 1F2J “ iFi L fc +!_' 

Also, the efficiency of the simple lattice, relative to randomized complete 
blocks, is 

, k + TF 1 /IF 2 

(/c - I) + 4 IFi7(TFi -}- TF 2 )' 

The reader is advised to read references 2 and 3 for more details on lattice 
designs. 

If the simple lattice is duplicated several times (Sec. 19.4) so that 2d 
replications (2kd blocks) are used, it can be shown that the expected 
value of the ‘‘block-effect” mean square with 2{d — l)(fc — 1 ) degrees 
of freedom is (a^ + karl). The expected value of the other mean square 
with 2(/c — 1 ) degrees of freedom is, as above, (cr^ + ikal). Hence, if 



the two mean squares are pooled to obtain the adjusted-blocks mean 
square with 2d{k — 1) degrees of freedom, the expected value of this 


mean square is 





In this latter case, 


Txr ^ 2d - 1 ^ ^ d{V, ~ F,) 

" 2dF5 - Fe 2/c[dF, -h (d - l)Fe]‘ 


The variances of adjusted mean differences will be divided by d. 

Example 24.2. In order to illustrate only the computing procedure 
for the recovery of interblock information, we shall use Example 19.2. 
Actually one should not contemplate any recovery with so few degrees 
of freedom for blocks. 


Di. 

-22 
-21 
- 3 

As a check, SSB(unadj) -f SST(adj) should be equal to SSB(adj) -f- 
SST (unadj). We note that 

68 -f 593 = 134 -f 527 = 661. 

It is advisable to make this check on the computing. The analysis of 
variance is: 


D.i 

2 

-16 

-32 

-46 


SSB(adj) = 


z{Di + 


(-46)=^ 

9 


134. 


Source of variation 

Degrees of 
freedom 

Mean 

square 

I 

E(MS) 

1 

Replications 

1 

118 


Treatments (adj) 

8 

74 


Blocks (ad j) 

4 

34 = V, 

<r? + l.>v; 

Error 

4 

47 = 1% 

1 



The estimate of cl would be negative, and so we assume = 0. 
Hence, no adjustments would be made even if there were sufficient 
degrees of freedom to estimate the weights. This raises a rather critical 
issue regarding incomplete-blocks designs, namely, that we do not take 
the loss in efficiency which we really suffered by using such a design with 
a negative estimate of cl. Instead we use t'ij — Tij/2, with the variance 
of a mean equal to V" = 24. 

If Fb > Fe so we could obtain a positive estimate of cl and if there were 
sufficient degrees of freedom to warrant weighting, we would compute 






dij = D,, - D.,. Then 6 = (Wi - TF 2 )/ 6 (TTi -f- TT 2 ), where TFi = 1/7. 
and T 72 = 1/(276 — 7.). (Here we assume that ^ = 0.) An adjusted 
treatment mean is 


=-f~ 08,,: 


The estimated average variance of the difference between two adjusted 
means is 

7j, , 276 - 7.1 
2 L H6 

24.3. Other Lattice Designs. Reference 3 contains the theory and 
computing details for a triple lattice and reference 2 the computing pro¬ 
cedures for rectangular lattices. One of the present authors presented 
the method of analysis for d duplications of any orthogonal square lattice 
design, t If there are r replications in the basic design {rd total replica¬ 
tions) and if the two parts of the adjusted-blocks sum of squares are 
pooled with rd{k — 1 ) degrees of freedom, 


TT 7 rd ~ I 
rdV, - V, 


and d = 


T7i - 172 

rk[ir - l)Wi + T 72 ]‘ 


The average variance of the difference between two adjusted means is 


k -b 1 


EXERCISES 

24.1. (a) Prove that, if we have two unbiased independent estimates, 
ti and t 2 , of a parameter r with respective variances l/7'^i and l/lTg, the 
combined linear unbiased estimate with lowest variance will be 

_ T7;^1 + W',t2 
^ W[ + 17i 

Hint: remember that E(t) = r. 

(h) Show that h and t 2 are independent estimates of r in Sec. 24.1. 

24.2. (a) Derive the formula for and show that 4'c = (Ei/r) + $Di. 

( 6 ) Also derive the variance of the difference between two adjusted 


(c) Show that SSB(adj) is independent of treatment effects. 

(d) Show that .^(76) and the estimate of 172 are correct in (vi) above. 

t Presented by R. L. Anderson at a meeting of the Biometrics Section of the Ameri¬ 
can Statistical Association, March, 1946. An abstract is presented in the June, 
1946, issue of the Biometrics Bulletin (p. 58). 
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24.3. As a computing exercise only, analyze the data in Exercise 19.1, 
using recovery of interblock information. Why is it dangerous in prac¬ 
tice to use recovery for this example? 

24.4. Analyze the data in Exercises 19.2 and 19.3, using recovery of 
interblock information. 

24.5. Prove that Ve is an unbiased estimate of al. In other words, 

show that SSE = Sx‘^ + ~ SST(unadj) — SSB(adj). 

24.6. On page 363, show that the results for (t\ and <r| are correct. 
Why is {2Vh — Ve) an unbiased estimate of (<t^ -f 

24.7. Derive the formulas for the variances of the differences between 
two adjusted means, using a simple lattice design. Also, derive the 
formula for relative efficiency, 1. 

24.8. (a) Analyze the data in Exercise 19.4, using recovery of inter¬ 
block information. 

{h) What is the relative efficiency of this design compared with a 
randomized complete-blocks design? 
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CHAPTER 25 


OTHER TOPICS CONCERNING COMPONENTS OF VARIANCE: 
SUMMARY OF NEEDED RESEARCH 

25.1. Covariance. Cochran^ considered the following covariance 
model for a sampling design with p plots and r samples per plot, 

Fii = M + rf 4- 

where the r’s and e’s are assumed NID with zero means and respective 
variances af and erf, and p, /3, and the x’s are assumed fixed. The maxi- 
mum-likelihood estimate of /3 is a weighted mean of the 6’s derived from 
the plots {ht) and error (he) lines of the analysis of covariance [see Sec. 
21.2 (v) with treatments corresponding to plots]. 

If information about /3 in the plots line is disregarded, the procedure 
is as follows: 

(i) 5*2 ig an unbiased estimate of erf and SSE* is distributed as 

o-fxJ(._i)_i. 

(ii) SST* = (Yi. - Y - b,Xi.y + (bi - b,y 

t 

Yi. = M + T* + mi- + €,■.. 


The first term is distributed as (df -f- raf)xip- 2 )- Since 

r2(Tj* -f- ei.)xi. ^ , 'Ei'^eij{Xij Xi.) 


= ^ + 


be = ^-{- 


E, 


the second term is distributed as (o-f -f- where X — E^x/ {Txx + Exx)- 

(iii) All three chi-squares are independent. 

. . . (MST* - 5*2) p - 1 

(ly) <r; =- 

The estimate of (xf in (iv) involves some loss of information because plot 
information about ^ is ignored, and the single degree of freedom in SST* 
is presumably given too much weight. 

If the estimate of ^ from the error line is used, parallel results can be 
obtained for a two-way classification, for example, randomized blocks. 

26.2. The Use of Components of Variance in Regression Problems. 

Variance components are also used to evaluate methods of estimating 
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regression coefficients. The experimenter often obtains several values 
of Y for each value of X in order to determine the sampling or observa¬ 
tional error (see Example 13.1 and Fig. 13.3). The model for r inde¬ 
pendent variates with p samples of Y for each set of {X*} values is 

Yjk M ^ ^i^ij “f" “}~ ^jk) j f J * * • f k b 2, . . . y Py 

where e and d are NID with zero means and respective variances o-f 
and cr|. o-f measures the failure of the regression line to go through the 
average F values (Fy.), while o-l measures the fluctuation of individual 
values about their means Fy.. 

The analysis of variance of the residuals will have the following appear¬ 
ance: 


Error 

Degrees of 
freedom 

Mean 

square 

E(MS) 

Deviations from regression 

n — r — 1 

Ve 

cr2 = 0-^ + P<tI 

Observations 

n(p - 1) 

Va 



In this case the experimenter can determine whether he should add more 
sets of X’s or collect more values of F for each set, basing his conclusions 
on the relative size of and o-; as compared with the relative costs of 
obtaining more sets or more observations per set. 

Very few examples are available of planned experiments with the same 
number of observations per set (other than one observation). In many 
cases, there will be experiments with several values of F for a given Xi, 
but the values of X 2 , X 3 , . . . vary as well as F (see the vitamin B 2 
example in Chap. 15). Hence we are led to use the one-error regression 
equation. Another difficulty often arises, namely, that the observation 
error tends to increase with increasing F. This difficulty is not avoided 
when we use only one observation per set, but it is usually neglected 
(see Sec. 14.4). If several observations were taken for each set of X 
values, the experimenter would have some information as to the uni¬ 
formity of his variation as F increased. 

The factorial design is an example of several observations for each set of 
X’s; however, the X values are often qualitative rather than quantitative 
(different varieties, cultivation methods, teaching methods, etc.). 
When a factorial is used with different levels of the factors, the principle 
of multiple regression with several observations for each set of X values 
is being applied. However, there are seldom enough levels of the 
various factors to estimate the deviation error (for example, see Exercises 
20.4, 20.6, and 20.7). The article referred to in Exercise 20.11 was 



based on estimating a quadratic regression of yield on planting date of 
cotton with 10 equally spaced planting dates. In this case <r| = .3600 
and al 4- pal = .3357, a value less than the estimate of al. This is an 
all too common result—the estimate of al is negative. This indicates a 
need for a more thorough investigation for many regression problems 
of the accuracy of the assumptions of homogeneous variance from point 
to point and independence of the true residuals. 

Example 25.1. A survey was conducted in eastern North Carolina in 
1949 to estimate the relationship between per cent dry weight (Y) of 
Irish potatoes (Cobbler variety) and X = 200 (specific gravity - 1.045). 
The values of X were -1, 0, 1, . . . , 11, with 8 samples and 2 deter¬ 
minations per sample for many samples. If there had been potatoes 
for each of the 13 X classes for each sample and determination, there 
would have been 104 samples and 208 determinations. Unfortunately, 
there were few samples for X = — 1 and X = 11 and man}'' blanks in 
other places, so that only 127 determinations were obtained. The data 
were as shown in Table 25.1. 


Table 25.1 








X 





. . . f - . ... -u - ■' 

Sample 


- - . 











0 

1 

2 

3 

4 

5 

6t 

7 

8 

9 

10 11 

1 

14.34 

14.73 

16.30 

17.46 

18.82 

19.48 

20.40 

20.67 

21.75 




14.66 

14.87 

16.79 

17.23 

17.68 

19.16 

19.90 

21.27 

22.70 



2 

14.12 

15.04 

16.50 

16.75 

17.90 

18.91 

20.35 

20.72 

22.28 

23.45 

24.98 


14.82 

15.88 

16.43 

17.08 

18.39 


19.32 

20.89 

22.28 

23.31 

24.53 








19.68 





3 

13.58 

15.48 

15.80 

17.38 

17.86 

18.81 

19.68 

20.78 21.83 

23.50 24.60 


13.68 

14.07 

15.60 

16.80 

18.30 

18.84 

19.90 

20.86 

22.17 

23.39 

24.80 

4 

13.83 

15.04 

16.17 

17.63 

17.90 

19.15 

20.11 

20.69 





13.92 

15.36 

16.10 

16.68 

17.88 

20.18 21.14 

21.05 




5 

13.53 

14.34 

15.60 

16.78 

17.66 


19.08 

20.88 

21.71 

23.25 

24.80 

6 


15.30 

15.60 

16.63 

17.98 


19.38 20.56 

22.78 


24.80 

7 

14.09 

14.10 

15.50 

16.48 

17.63 


19.10 

20.34 

21.71 

22.73 

23.60 25.56 

8J 

13.86 

14.63 

15.14 

16.49 

17.10 

18.51 

19.18 

20.26 

21.44 

22.43 



13.61 

14.63 

14.90 

16.27 

17.20 

17.97 

19.08 

20.80 

21.64 




t Three determinations on the second sample. 

X There was 1 determination = 13.05 for X = — 1. 








The analysis for the third sample is as follows: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

EiMS) 

Regression 

1 

253.593 


Deviations 

9 

.1086 

+ 2(tI 

Determinations 

11 

. 1269 



Again we note a negative estimate of £r|. For the 5 samples with duplicate 
determinations, 2 had negative estimates of cj, 2 had positive estimates, 
and 1 was almost zero (slightly positive). 

An over-all regression analysis for all 8 samples gave o-J = .1362 and 
= .1804. The value of X is somewhere between 1 and 2. A 
rough approximation is the number of determinations divided by the 
number of samples = ~ 1.6. 

Example 25.2. In Example 13.1, we used the total error sum of squares 
to estimate the error mean square to test whether or not (3 was zero. If 
we disregard the fact that the error variance tended to increase with 
increasing X, we could subdivide the error sum of squares into two parts 
as follows: 


Error 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

E(MS) 

Deviations from regression 

47 

407 

8.66 

' 4 + XcT^t 

Between families for a given A' 

860 

1,139 

1.32 



t X is the average number of families for each X. 


A rough approximation for X would be = 18.6. A more accurate 
estimate of X is given by use of the formulas given in Sec. 22.4 (see 
Exercise 22.18). 


X 



909 - 



i 


909 


1 


where the Ui are the number of families for X = Xi (see Table 13.1). 


In this case — 33,161 and 


47 (909 - 3 ( 5 - 5 ) = 47- 


18.6, 


Avhich is the same as above (to one decimal place). 
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If Z == log (Y + 1 ) is used in the regression equation, the error splits 
up as follows: 


1 

Error 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Deviations from regression 

Between families for a given X 

47 

860 

2.121 

22.429 

.0451 

.0261 


In this case we have reduced the al part substantially by making the 
more nearl}^ equal. But notice what happens for the incomes less than 
14,000 when a log transformation is used. In this case cr| == .0214, and 
= .0233, indicating that al is practically zero. It would 
appear that extremely large estimates of or| may be caused by unequal 
sampling variability along the regression line. Some theoretical study 
should be made on this point. 

26.3. Other Topics in Relation to Variance Components. The indus¬ 
trial and engineering research men have need for variance components in 
studying the precision of measurements. This topic is still in the explora¬ 
tory stage and is too complicated to be discussed here. If the reader is 
interested in precision of measurements, he is advised to read references 
2 to 4. Cameron^ considers the use of variance components in deter¬ 
mining the precision of estimating the clean content of wool, where 
several cores are taken from each of a number of bales of wool (see 
Exercises 25.4 and 25.5 for his technique). 

Tukey*^ considers the problems of estimating regression coefficients and 
variance components when both variates (X and F) are subject to error, 
that is, each observed quantity == (steady part) + (fluctuation). In 
most practical cases, a third variate (an mstnmiental variate) with special 
properties is required. These special properties are that its covariance 
with the fluctuations in both X and Y vanish, but its covariance with the 
steady parts shall not vanish. The three fields of application where 
these problems have been most prevalent are indicated to be precision 
of measurements, psychology, and econometrics. We have not dis¬ 
cussed problems of this nature in this book, because they are too com¬ 
plicated for an elementary treatment. However, the reader is advised 
to read Tukey’s article at least to obtain some insight into the nature 
of the problem. He provides a list of references, which should also be 
useful. 

Some recent articles have been written on the power of current tests 
of significance for variance components (see, for example, references 7 


374 LEAST-SQUARES ANALYSIS 

and 8). Cochran® also discusses the power of hisF" statistic, mentioned 
in Chap. 23. 

25.4. Summary of Needed Research. In these last four chapters, 
we have presented a rather extensive treatment of a relatively new 
statistical tool—components of variance. However, this is actually 
the basis of all statistical concepts, because it deals with that particular 
aspect of data which requires statistical treatment— variability. We 
have here a method of identifying separate sources of variability and 
using estimates of these variances to plan future experiments, to make 
tests of significance, and to set confidence limits on a treatment or a 
general mean. 

Since this statistical tool is so new, adequate statistical theory has 
not been developed to apply it to all problems where it should be applied. 
Some of the problems in variance component analysis which should be 
studied are: 

(1) In a randomized complete-blocks model with all effects random 
except the mean, the assumption of independence of the main effects 
and interactions frequently does not appear to be justified. The assump¬ 
tion of additivity needs to be explored in detail, especially with inter¬ 
actions and main effects. Why do large main effects more often produce 
significant interactions than do small main effects? Can anything be 
done to correct this difficulty? 

(2) A clear statement is needed of how to determine whether the 
interaction in a two-way classification is fixed or random. 

(3) A better method of handling finite populations is needed. 

(4) More exact methods of assigning confidence limits for variance 
components should be developed. 

(5) Better agreement is needed on how to handle the mixed model, 
which probably is the most important of all the models. 

(6) How can we detect correlated errors? And if we find that the 
errors are correlated, what should be done? 

(7) A study needs to be made of the efficiency of various methods of 
analyzing multiple classifications with unequal numbers in the sub¬ 
classes. Also, short-cut computing techniques are needed, especially 
some Avhich can be used with card-punching and electronic equipment. 

(8) What can be done to simplify the analysis of data with unequal 
variances? 

(9) What are the effects of various types of nonnormality on the con¬ 
sistency and efficiency of estimates? 

(10) What is the best test in a mixed model where the error must be 
estimated from several mean squares? 
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(11) Some study needs to be made of the proper allocation of samples 
in a nested sampling problem with limited resources and a need for good 
estimates of all components (see, for example, Exercise 22.19). 

EXERCISES 

26.1. (a) Prove Cochran’s results for the use of variance components 
in a covariance analysis, and apply them to Exercise 21.5. 

(h) Derive the same results for a randomized complete-blocks experi¬ 
ment. 

26.2. (a) In Example 25.1, determine the value of X by use of the 
regression model, as was done in Example 25.2. 

(h) Estimate <rl for the other four samples with duplicate determina¬ 
tions. 

(c) Show how the over-all estimates of o-| and X<rf were obtained. 

(d) What is the proper error term to use in testing for the significance 
of a single regression line? 

(e) How would you set up a computing procedure to test the over-all 
regression and the deviations of sample regressions from the over-all 
regression ? 

(/) Is there much evidence that was not constant from one sample 
to another? 

26.3. (a) In Example 13.1, assuming o-J constant from X to X, what 
is the expectation of SSR? Hence, how should you test the hypothesis 
that /5 — 0? 

(b) If a weighted analysis is used, what happens to the expectations 
of the mean squares? 

(c) On the basis of (6), what advantages do you see in the use of a 
transformation instead of a weighted analysis, if the transformation 
can make the variances reasonably stable? 

26.4. Given the model 


Yijk = M + (Xi + (Sij + yijky 


where ai represents the fth lot, ^ij the yth bale in the ith lot, and yijk 
the kth core in the (ij) bale. We shall assume all effects, except ju, are 
NID with respective variances al, al, and Assume we have 2M 
lots with n bales per lot and ki cores per bale for M lots and kt cores per 
bale for the other M lots. All cores for each lot are composited. 

(a) Show that the variances per lot mean are 
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(b) Given that Vi is the mean square between the M lots with nki 
cores per lot and V 2 for the other M lots. Show that, if ^2 > ^ 1 , 


(ndj + ^2) ^ 

K2 — /^i 

(c) How could you design a sample to estimate both al and <xl? 

26.6. Use the same model as in Exercise 25.4. Assume that Ui bales 
are selected from the ^th of the first AI lots with 2 cores per bale, 1 core 
from each bale being composited to give 2 composite samples per lot. 
Call the difference between these 2 samples di. For the other AI bales, 
2nk bales are taken from each, and again 2 cores per bale; both cores 
from Uk of the bales are composited, and similarly for the other bales, 
again giving 2 composite samples per lot. Call the difference between 
these two samples dt. 

(a) Show that o--(di) = 2a\lni and (j‘^{di^ — {2 (tI + aD/njc. 

(h) Show that 

M 

2 • i = 1 

- 2M ’ 


^ 'n^kdl 


i + 


(c) What is the estimate of (r|? 

(d) What are the variances of the estimates of and 
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Explanation of the Tables 

la. Ordinates of the Normal Distribution. This table gives values of 


fix) = 


-4= 

a/ 27r 


for values of * between 0 and i at intervals of .01. For negative values of a:, one uses 
the relation/( — x) = fix). 

lb. Area under the Normal Curve, Fiy). This table gives values of 


Fiy) - 1 - I 


2 dx 

- « \/27r 


for values of y between 0 and 3.5 at intervals of .01. For negative values of y, one 

uses the relation Fi~y) = i — F{y) = ci/2. .u 

Values of y corresponding to a few selected values of « are presented beneath the 
main table. These are useful in making tests of significance for a specified significance 
probability, or in setting up confidence intervals with confidence probability 
(1 - a), as in Sec. 7.2 for the statistic T^. . . ^ ^ j 

II. Percentage Points of the Distribution. This table gives values of %« for selected 

values of « and for the number of degrees of freedom, v, ranging from 1 to 30, plus 
V - 40, 50, 60, where 


a 


L 


Xa' F(k/2) 




e-xV2 



For larger values of a normal approximation is quite accurate. The quantity 
— V'2 j» — 1 is nearly normally distributed with zero mean and unit variance. 

Hence xl ^^^.y be computed as 


xl = K'JV + V2. - 1)* 

where we use because the probability for x" corresponds to a single tail of the 
normal curve (see Sec. 7.2 and Table Ih). For example consider x.« for r = 40- 
The exact value is x*o> = 55.8, and the approximate value would be 

I KF645 4- = 55.5. 

III. Percentage Points of the t Distribution. This table gives values of for 
values of « and for the following number of degrees of freedom, n: 1, 2, . . . , 30, 40, 
60, 120, 00 , wheri3 


a — 2 




■\/xn r in/2) 


/ / 2 \-(n+l )/2 

(i+L) di. 
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If a percentage point is needed for some n not given in the table, linear interpolation 
can be used between tabulated percentage points, but the reciprocals of n should be 
used. For example, the value of /.05 for 50 degrees of freedom would be 

2.021 - (.021) = 2.008. 

TTT “ eo' 

IV. Percentage Points of the F Distribution. This table gives values of Fa for 
a = .01 and .05; ni = 1 (1) 10, 12, 16, 20, 24, 30, 50, 100, 00 ; and na = 1 (1) 20 (2) 28 
(4) 40, 60, 100, 200, 00 . Fa is defined as follows: 



F = sl/sl, where is a sample variance with Ui degrees of freedom and Sj a sample 
variance with n^ degrees of freedom. The table also can be used to determine Fi_a 
by use of the formula 

where the first n indicates the number of degrees of freedom in the numerator and the 
second n the number of degrees of freedom in the denominator. For example 


F. 96(3,75) 


1 

F.o5(75,3) 


8.57' 


One should interpolate on the reciprocals of ni and n-i as in Table III. 
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Table 16 


Area under the Normal Curve, F(?y)t 


V 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

.0 

.5000 

.5040 

.5080 

.5120 

.5160 

.5199 

.5239 

.5279 

.5319 

.5359 

.1 

.5398 

.5438 

.5478 

.5517 

.5557 

. 5596 

.5636 

.5675 

. 5714 

.5753 

.2 

.5793 

.5832 

.5871 

.5910 

.5948 

.5987 

.6026 

.6064 

.6103 

.6141 

.3 

.6179 

.6217 

.6255 

.6293 

.6331 

.6368 

.6406 

. 6443 

.6480 

.6517 

A 

.6554 

.6591 

.6628 

.6664 

.6700 

.6736 

.6772 

.6808 

.6844 

.6879 

.5 

.6915 

.6950 

.6985 

.7019 

.7054 

.7088 

.7123 

.7157 

.7190 

.7224 

.6 

.7257 

.7291 

.7324 

.7357 

.7389 

.7422 

.7454 

.7486 

.7517 

.7549 

.7 

.7580 

.7611 

.7642 

.7673 

.7704 

.7734 

.7764 

.7794 

.7823 

.7852 

.8 

.7881 

.7910 

.7939 

.7967 

.7995 

.8023 

.8051 

.8078 

.8106 

.8133 

.9 

.8159 

.8186 

.8212 

.8238 

.8264 

.8289 

.8315 

.8340 

.8365 

.8389 

1.0 

.8413 

.8438 

.8461 

.8485 

.8508 

.8531 

.8554 

.8577 

. 8599 

.8621 

1.1 

. 8643 

.8665 

.8686 

.8708 

.8729 

.8749 

.8770 

.8790 

.8810 

.8830 

1.2 

.8849 

.8869 

.8888 

.8907 

.8925 

.8944 

.8962 

.8980 

.8997 

.9015 

1.3 

.9032 

.9049 

.9066 

.9082 

.9099 

.9115 

.9131 

.9147 

.9162 

.9177 

1.4 

.9192 

.9207 

.9222 

.9236 

.9251 

.9265 

.9279 

.9292 

.9306 

.9319 

1.5 

.9332 

.9345 

.9357 

.9370 

.9382 

.9394 

.9406 

.9418 

' .9429 

.9441 

1.6 

. 9452 

.9463 

.9474 

.9484 

.9495 

.9505 

.9515 

.9525 

. 9535 

.9545 

1.7 

.9554 

. 9564 

.9573 

.9582 

.9591 

.9599 

.9608 

.9616 

.9625 

.9633 

1.8 

.9641 

.9649 

.9656 

.9664 

.9671 

.9678 

.9686 

.9693 

.9699 

.9706 

1.9 

.9713 

.9719 

.9726 

.9732 

.9738 

.9744 

.9750 

.9756 

.9761 

.9767 

2.0 

.9772 

.9778 

.9783 

.9788 

.9793 

.9798 

.9803 

.9808 

.9812 

.9817 

2.1 

.9821 

.9826 

.9830 

.9834 

.9838 

.9842 

.9846 

.9850 

.9854 

.9857 

2.2 

.9861 

.9864 

.9868 

.9871 

.9875 

.9878 

.9881 

.9884 

.9887 

.9890 

2.3 

.9893 

.9896 

.9898 

.9901 

.9904 

.9906 

.9909 

.9911 

.9913 

.9916 

2.4 

.9918 

.9920 

.9922 

.9925 

.9927 

.9929 

.9931 

.9932 

.9934 

.9936 

2.5 

.9938 

.9940 

.9941 

.9943 

.9945 

.9946 

.9948 

.9949 

.9951 

.9952 

2.6 

.9953 

.9955 

.9956 

.9957 

.9959 

.9960 

.9961 

.9962 

.9963 

.9964 

2.7 

.9965 

.9966 

.9967 

.9968 

.9969 

.9970 

.9971 

.9972 

.9973 

.9974 

2.8 

.9974 

.9975 

.9976 

.9977 

.9977 

.9978 

.9979 

.9979 

.9980 

.9981 

2.9 

.9981 

.9982 

.9982 

.9983 

.9984 

.9984 

.9985 

.9985 

.9986 

.9986 

3.0 

.9987 

.9987 

.9987 

.9988 

.9988 

.9989 

.9989 

.9989 

.9990 

.9990 

3.1 

.9990 

.9991 

.9991 

.9991 

.9992 

.9992 

.9992 

.9992 

.9993 

.9993 

3.2 

.9993 

.9993 

.9994 

.9994 

.9994 

.9994 

.9994 

.9995 

.9995 

.9995 

3.3 

.9995 

.9995 

.9995 

.9996 

.9996 

.9996 

.9996 

.9996 

.9996 

.9997 

3.4 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9998 


Percentage Points of the Normal Distribution f 


F{y) 

.75 

.90 

.95 

.975 

.99 

.995 

.999 

.9995 

.99995 

.999995 

« = 2[1 -F{y)\ 

.50 

.20 

.10 

.05 

.02 

.01 

.002 

.001 

.0001 

.00001 

Ta , 

0.674 

1.282 

1.645 

1.960 

2.326 

2.576 

3.090 

3.291 

3.891 

4.417 


t F{y) is the area under the normal curve from - to y; a is twice the area from y 
to 00 (area from — co to —y plus the area from y io oo). 
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Table III 

Percentage Points of the t Distribution f 


a 

n 

.50 

.40 


30 


20 

.10 

.05 

.02 

.01 

.001 

1 

1.000 

1.376 

1 

963 

3 

078 

6 

314 

12.706 

31. 

821 

63. 

657 

636. 

619 

2 

.816 

1.061 

1 

386 

1 

886 

2 

920 

4.303 

6 

965 

9. 

925 

31. 

598 

3 

.765 

.978 

1 

250 

1 

638 

2 

353 

3.182 

4 

541 

5. 

841 

12 

941 

4 

.741 

.941 

1 

190 

1 

533 

2 

132 

2,776 

3 

747 

4. 

604 

8 

610 

5 

.727 

.920 

1 

156 

1 

476 

2 

015 

2.571 

3 

365 

4. 

032 

6 

859 

6 

.718 

.906 

1 

134 

1 

440 

1 

943 

2.447 

3 

143 

3. 

707 

5 

959 

7 

.711 

.896 

1 

119 

1 

415 

1 

895 

2.365 

2 

998 

3 

499 

5 

405 

8 

.706 

.889 

1 

108 

1 

397 

1 

860 

2.306 

2 

896 

3 

355 

5 

041 

9 

.703 

,883 

1 

100 

1 

383 

1 

833 

2.262 

2 

821 

3 

250 

4 

781 

10 

.700 

.879 

1 

093 

1 

372 

1 

812 

2.228 

2 

764 

3 

169 

4 

587 

11 

.697 

.876 

1 

088 

1 

363 

1 

796 

2.201 

2 

718 

3 

106 

4 

437 

12 

.695 

.873 

1 

083 

1 

356 

1 

782 

2.179 

2 

681 

3 

055 

4 

318 

13 

.694 

.870 

1 

079 

1 

350 

1 

771 

2.160 

2 

650 

3 

012 

4 

221 

14 

.692 

.868 

1 

076 

1 

345 

1 

761 

2.145 

2 

624 

2 

977 

4 

140 

15 

.691 

.866 

1 

074 

1 

341 

1 

753 

2.131 

2 

602 

2 

947 

4 

073 

16 

.690 

.865 

1 

071 

1 

337 

1 

746 

2.120 

2 

583 

2 

921 

4 

015 

17 

.689 

.863 

1 

.069 

1 

.333 

1 

740 

2.110 

2 

567 

2 

898 

3 

965 

18 

.688 

.862 

1 

.067 

1 

.330 

1 

734 

2.101 

2 

552 

2 

878 

3 

922 

19 

.688 

.861 

1 

.066 

1 

.328 

1 

729 

2.093 

2 

539 

2 

861 

3 

883 

20 

.687 
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analysis of variance, 172 
computational methods, 191 
confidence limits of estimates, 203 
curvilinear regression, 207 

(See also Orthogonal polynomials) 
Doolittle method, example, 192 
exercises, 187-189, 203-206 
normal equations, 169 
regression model, 168 
tests of significance, theory of, 177 
unequal variances, 182 
variance components, 370-372 
Regression coefficients, 168 
Regression mean square (MSR), 160 
Regression model, 156, 168, 207, 211, 370 
Regression sum of squares (SSR), 157, 
171, 196, 198, 212 
Reid, W. A, 205, 206 
Replications, 227 
hidden, 268 
Residual, 154, 168 

Rigney, J. A., 59, 66, 309, 311, 333, 
335-337 

Robertson, D. W., 138, 149 
Robinson, H. F., 261, 266, 313, 330, 335 
Roth, L., 18 

S 

Sample, 59 

amount of information in, 97 
large, 71, 93-95 
Sample surveys, 218 

variance components, 326, 327 
Sasuly, Max, 207, 216 
Satterthwaite, F. E,, 83, 90, 314, 321,322, 
335, 347, 350, 357 

Satterthwaite approximation, comparing 
two means with unequal variances, 
83 

confidence limits, for means, 322, 347 
for variance components, 321, 332 
F test for mixed model, 350 
Saunders, A. R., 288, 295 
Scholl, J. C., 202, 206 
Schumacher, F. X,, 60, 66 
Second-order interaction, test of, 144-149 
Serial correlation, 222 
Significance level, 79, 84 


Simple lattice designs, fixed block effects, 
256-261 

analysis of variance, 258 
example, 259 
exercise, 264 

variances of mean differences, 259 
random block effects, 361-366 
adjusted treatment means, 364 
efficiency, 364 

estimates of variance components, 
362 

example, 365 
exercises, 367 
regression model, 361 
variances of mean differences, 364 
Simple test, 116-119 
Skewness, as, 33 
71, 38 

Smith, A. L., 367 

Smith, H. F., 83, 90, 357, 373, 376 
Snedecor, G. W., 8, 22, 29, 60, 66, 137, 
149, 169, 190-192, 206, 207, 216, 217, 
225, 243, 248, 275, 278, 279, 282, 295, 
296, 301, 306, 310, 311, 347, 357, 387 
Split-plot designs, 344-351 
analysis of variance, 345 
examples, 347, 349 
exercises, 352-356 
experimental models, 345, 347 
uses of, 347 

variances of mean differences, 345-346 
Sprague, G. F., 336 
Standard deviation, 33 
Statistical inference, 113 
Statistical Research Group, Columbia 
University, 226 ‘ 

Statistics, definition of, 3, 6 
Stewart, H. A., 367 
Stirling’s approximation, 11 
Stochastic convergence, 93 
“Student” (W. S. Gosset), 226 
Sufficient estimators, 96 
Sukhatme, P. V., 82, 90, 337 
Sum, of cross-products, 157, 191 
of squares, distribution of, 74, 76 
due to regression {see Regression sum 
of squares) 

Sweep-out method, 280 
Systematic sampling, 59 




398 


INDEX 


T 

77 

distribution of, 77-83 
problem of unequal variances, 81-83 
uses, confidence limits of means, 232, 
239, 346 

testing means, 78, 81 
testing regression coefficients, 160, 
178, 196 

Tang, P. C., 123, 129, 220, 225 
Tanner, L., 377 

Tchebysheff, P. L., 93, 207, 216 
Tchebysheff’s inequality, 93-94 
Test criterion, 113 
Tests of hypotheses, 113-130 
composite, 119-121 
power of, 114-115 
sample, 116-119 
unbiased, 114 

uniformly most powerful, 114, 119 
(See also names of tests, as Chi-square 
test) 

Tests of significance (see Tests of hy¬ 
pothesis) 

Thompson, C. M., 73, 90, 383 
Tippett, L. H. C., 60, 66, 226, 336 
Tokarska, B., 121, 122, 129 
Transformations, 222 
Treatments, 217 

comparison of means, complete-blocks 
designs, 232, 236 
covariance, 299 

incomplete-blocks designs, 251, 254, 
259, 359 

split-plot designs, 346 
effects, 218, 297, 358, 363 
linear functions, 241 
sum of squares and mean square, com¬ 
plete-blocks designs, 230, 236 
covariance, 301, 369 
factorial experiments, 269, 281 
incomplete-blocks designs, 251, 252, 
258 

Triangular distribution, 28 
Triple lattice designs, 256, 366 
True regression, 168 
True regression line, 154, 155 
Tsao, F., 308, 311, 352, 357 


Tukey, J. W., 221, 225, 239, 248, 344, 357, 
373, 376, 377 
Type I error, 115 
Type II error, 115 

U 

Unbiased tests, 114 
Unequal subclass numbers, 278 
Unequal variances, 81-83, 141-144, 182, 
189, 373 

Uniformly most powerful test, 114, 119 
Unweighted means, method of, 279 
Uspensky, J. V., 18 

V 

Variance, analysis of (see Analysis of 
variance) 

of sample, mean, one component of 
error, 61, 69, 157, 231 
several components of error, 318, 
325, 351 

mean difference, complete-block de¬ 
signs, 81, 232, 236 
covariance, 299, 302, 306, 307 
factorial experiments, 272 
incomplete-blocks designs, 251, 
254, 259, 359, 364 
mixed model, 341, 344-346, 351 
regression coefficients, 157, 170, 196, 
215 

variance components, 321, 342 
Variance components, 313, 314 

all effects random except mean, 313- 
337 

confidence limits, 321 
estimates of components, 319, 326 
examples, 323, 326, 328 
exercises, 330-335 
interactions, 314 
nested sampling, 325, 327 
relative importance of components, 
318 

mixed model, 338-357 

estimates of components, 341 
examples, 343, 347, 349 
exercises, 351-356 
split-plot designs, 344-351 
variances of mean differences, 341 


INDEX 


399 


Variance components, models, 313 
needed research, 374 
power of tests, 373, 374 
precision of measurements, 373 
recovery of interblock information, 
358-368 

balanced designs, 358-361 
lattice designs, 361-366 
regression analysis, 369-373 
Variance estimators, 61, 93 
Variate, 19 

dependent, 153 
independent (fixed), 152 

W 

Wakeley, J. T., 191, 204, 206 
Wald, A., 130, 160, 167, 336, 377 
Watson, G. S., 160, 167, 222, 225 
Waugh, F, V., 191, 206 
Weber, C. It., 336 

Weighted squares of means, method of, 
279 

Weinberg, S. W., 207, 216 


Weiss, M. G., 266 

Welch, B. L, 82, 90, 248, 377 

Whitwell, J. C., 373, 376 

Whitworth, W. A., 18 

Wilcoxon, F., 4, 7, 8 

Wilks, S. S., 30, 108, 112, 190, 312 

Wilm, H. G., 349, 357 

Winsor, C. P., 160, 161, 167, 336 

Wishart, J., 226, 247, 248, 312, 353, 357 

Woltz, W. G., 205, 206 

Y 

Yates, F., 60, 66, 85, 90, 137, 149, 178, 
190, 211, 215, 216, 218, 221, 225, 226, 
250, 253, 257, 261, 265, 266, 268, 271, 
278, 279, 285, 287, 295, 296, 313, 335, 
337, 357, 358, 367, 368, 385 
Yield, 217 

Youden, W. J., 226, 262, 266, 296 
Z 

Zacopanay, L, 133, 335 
Zuber, M. S., 368 






